songtaohe / LaneExtraction

GNU General Public License v3.0
48 stars 11 forks source link

TuriningLaneValiation训练出错 / Error during TuriningLaneValiation Training #5

Closed ZiRanSlash closed 2 years ago

ZiRanSlash commented 2 years ago

您好!

我根据环境需求配置好了环境,并且成功跑通了LaneAndDirectionExtraction和TurningLaneExtration的代码,并且训练得到了模型,但是我在运行TuriningLaneValiation/train的时候发现会随机停住,不能继续训练下去。

[ERROR] links[ 2 ][ 6 ]: (1949, 661)  is not in pos2nid
ind is : _14
pos2nid keys is :  [(1744, 1350), (1842, 674), (1527, 1675), (97, 1550), (1678, 1368), (1398, 1934), (1843, 610), (143, 1501), (416, 1732), (168, 1649), (577, 1757), (786, 842), (458, 1855), (405, 1753), (304, 1573), (186, 1661), (1713, 1447), (114, 1533), (1579, 1600), (495, 742), (1556, 1692), (616, 637), (440, 1694), (548, 632), (1859, 1071), (224, 579), (440, 1845), (559, 1775), (1872, 689), (160, 480), (1767, 1359), (1869, 995), (533, 1815), (524, 1830), (456, 1673), (291, 1596), (245, 1649), (1834, 989), (1692, 1432), (1874, 608), (1832, 1059), (262, 1629), (159, 1477), (1601, 1618), (1917, 1057), (861, 887), (292, 483), (846, 827), (1644, 1706), (620, 744)]
pos2nid is :  dict_keys([(1744, 1350), (1842, 674), (1527, 1675), (97, 1550), (1678, 1368), (1398, 1934), (1843, 610), (143, 1501), (416, 1732), (168, 1649), (577, 1757), (786, 842), (458, 1855), (405, 1753), (304, 1573), (186, 1661), (1713, 1447), (114, 1533), (1579, 1600), (495, 742), (1556, 1692), (616, 637), (440, 1694), (548, 632), (1859, 1071), (224, 579), (440, 1845), (559, 1775), (1872, 689), (160, 480), (1767, 1359), (1869, 995), (533, 1815), (524, 1830), (456, 1673), (291, 1596), (245, 1649), (1834, 989), (1692, 1432), (1874, 608), (1832, 1059), (262, 1629), (159, 1477), (1601, 1618), (1917, 1057), (861, 887), (292, 483), (846, 827), (1644, 1706), (620, 744)])
< 2 ! >

1

[ERROR] links[ 2 ][ 1 ]: (1998, 1852)  is not in pos2nid
ind is : _11
pos2nid keys is :  [(314, 1471), (888, 1885), (397, 1389), (1891, 1846), (268, 1363), (1934, 1787), (822, 1828), (690, 1597), (1918, 1866), (883, 1824), (675, 1703), (624, 1580), (552, 1679), (1903, 1783)]
pos2nid is :  dict_keys([(314, 1471), (888, 1885), (397, 1389), (1891, 1846), (268, 1363), (1934, 1787), (822, 1828), (690, 1597), (1918, 1866), (883, 1824), (675, 1703), (624, 1580), (552, 1679), (1903, 1783)])
< 2 ! >

2

 [ERROR] links[ 2 ][ 6 ]: (278, 1680)  is not in pos2nid
ind is : _14
pos2nid keys is :  [(1203, 159), (290, 924), (1538, 1247), (1270, 49), (395, 1692), (1704, 234), (304, 672), (173, 1298), (1290, 52), (207, 595), (366, 1705), (1604, 1229), (1688, 274), (295, 577), (1352, 150), (238, 1339), (1896, 1281), (360, 296), (1258, 1147), (1213, 112), (274, 1333), (231, 861), (234, 963), (1617, 1107), (210, 962), (1342, 194), (1192, 181), (1532, 194), (222, 1265), (1516, 235), (256, 868), (1335, 219), (1680, 301), (374, 1632), (252, 1267), (1223, 1097), (1541, 169), (2021, 1239), (1355, 126), (278, 662), (1512, 261), (1308, 1113), (1498, 1148), (340, 1628), (1714, 211), (1217, 94), (1927, 1168), (1592, 139), (327, 584), (1614, 144)]
pos2nid is :  dict_keys([(1203, 159), (290, 924), (1538, 1247), (1270, 49), (395, 1692), (1704, 234), (304, 672), (173, 1298), (1290, 52), (207, 595), (366, 1705), (1604, 1229), (1688, 274), (295, 577), (1352, 150), (238, 1339), (1896, 1281), (360, 296), (1258, 1147), (1213, 112), (274, 1333), (231, 861), (234, 963), (1617, 1107), (210, 962), (1342, 194), (1192, 181), (1532, 194), (222, 1265), (1516, 235), (256, 868), (1335, 219), (1680, 301), (374, 1632), (252, 1267), (1223, 1097), (1541, 169), (2021, 1239), (1355, 126), (278, 662), (1512, 261), (1308, 1113), (1498, 1148), (340, 1628), (1714, 211), (1217, 94), (1927, 1168), (1592, 139), (327, 584), (1614, 144)])
< 2 ! >

3

通过检查代码后发现问题出现在turingLaneValidation/dataloader.py,大概168行。这里检查了links的数据是否在pos2nid中,但是部分数据集的数据并不满足这个条件,所以会执行最后的exit()退出当前的dataloader子进程,但是另外的dataloader子进程也因此卡住没有继续训练下去。

if (links[2][j][-1][0], links[2][j][-1][1]) not in pos2nid:
    print('\n', '[ERROR] links[', 2,'][', j,']:', (links[2][j][-1][0], links[2][j][-1][1]),' is not in pos2nid')
    print('ind is :', ind)
    print('pos2nid keys is : ', list(pos2nid.keys()))
    print('pos2nid is : ', pos2nid.keys())
    print('< 2 ! >')
    exit()

代码截图2

我对代码唯一的改动是将links[].keys()修改为了list(links[].keys()),因为我使用的的是python3,python3的dict.keys()方法返回的不是列表,而是视图对象,加上list()方法将其转换为列表。其他的代码除了print输出内容都没有改动。 代码截图3

我的数据集是运行create_training_data.py来生成,并且在LaneAndDirectionExtractionTurningLaneExtration的训练中没有问题。

我不知道该怎么解决这个问题,我想对数据集进行完整的训练,不知道是数据集的数据有问题还是故意为之,让其直接中止训练。希望可以得到您的答疑解惑。

ZiRanSlash commented 2 years ago

Hi!

I configured the environment according to the requirements of the environment, and successfully ran the codes of LaneAndDirectionExtraction and TurningLaneExtraction, and trained the model. However, when I ran TurningLaneValliation/Train.py, I found that it would stop randomly and could not continue training.

Such as :

[ERROR] links[ 2 ][ 6 ]: (1949, 661)  is not in pos2nid
ind is : _14
pos2nid keys is :  [(1744, 1350), (1842, 674), (1527, 1675), (97, 1550), (1678, 1368), (1398, 1934), (1843, 610), (143, 1501), (416, 1732), (168, 1649), (577, 1757), (786, 842), (458, 1855), (405, 1753), (304, 1573), (186, 1661), (1713, 1447), (114, 1533), (1579, 1600), (495, 742), (1556, 1692), (616, 637), (440, 1694), (548, 632), (1859, 1071), (224, 579), (440, 1845), (559, 1775), (1872, 689), (160, 480), (1767, 1359), (1869, 995), (533, 1815), (524, 1830), (456, 1673), (291, 1596), (245, 1649), (1834, 989), (1692, 1432), (1874, 608), (1832, 1059), (262, 1629), (159, 1477), (1601, 1618), (1917, 1057), (861, 887), (292, 483), (846, 827), (1644, 1706), (620, 744)]
pos2nid is :  dict_keys([(1744, 1350), (1842, 674), (1527, 1675), (97, 1550), (1678, 1368), (1398, 1934), (1843, 610), (143, 1501), (416, 1732), (168, 1649), (577, 1757), (786, 842), (458, 1855), (405, 1753), (304, 1573), (186, 1661), (1713, 1447), (114, 1533), (1579, 1600), (495, 742), (1556, 1692), (616, 637), (440, 1694), (548, 632), (1859, 1071), (224, 579), (440, 1845), (559, 1775), (1872, 689), (160, 480), (1767, 1359), (1869, 995), (533, 1815), (524, 1830), (456, 1673), (291, 1596), (245, 1649), (1834, 989), (1692, 1432), (1874, 608), (1832, 1059), (262, 1629), (159, 1477), (1601, 1618), (1917, 1057), (861, 887), (292, 483), (846, 827), (1644, 1706), (620, 744)])
< 2 ! >

1

 [ERROR] links[ 2 ][ 6 ]: (278, 1680)  is not in pos2nid
ind is : _14
pos2nid keys is :  [(1203, 159), (290, 924), (1538, 1247), (1270, 49), (395, 1692), (1704, 234), (304, 672), (173, 1298), (1290, 52), (207, 595), (366, 1705), (1604, 1229), (1688, 274), (295, 577), (1352, 150), (238, 1339), (1896, 1281), (360, 296), (1258, 1147), (1213, 112), (274, 1333), (231, 861), (234, 963), (1617, 1107), (210, 962), (1342, 194), (1192, 181), (1532, 194), (222, 1265), (1516, 235), (256, 868), (1335, 219), (1680, 301), (374, 1632), (252, 1267), (1223, 1097), (1541, 169), (2021, 1239), (1355, 126), (278, 662), (1512, 261), (1308, 1113), (1498, 1148), (340, 1628), (1714, 211), (1217, 94), (1927, 1168), (1592, 139), (327, 584), (1614, 144)]
pos2nid is :  dict_keys([(1203, 159), (290, 924), (1538, 1247), (1270, 49), (395, 1692), (1704, 234), (304, 672), (173, 1298), (1290, 52), (207, 595), (366, 1705), (1604, 1229), (1688, 274), (295, 577), (1352, 150), (238, 1339), (1896, 1281), (360, 296), (1258, 1147), (1213, 112), (274, 1333), (231, 861), (234, 963), (1617, 1107), (210, 962), (1342, 194), (1192, 181), (1532, 194), (222, 1265), (1516, 235), (256, 868), (1335, 219), (1680, 301), (374, 1632), (252, 1267), (1223, 1097), (1541, 169), (2021, 1239), (1355, 126), (278, 662), (1512, 261), (1308, 1113), (1498, 1148), (340, 1628), (1714, 211), (1217, 94), (1927, 1168), (1592, 139), (327, 584), (1614, 144)])
< 2 ! >

2

[ERROR] links[ 2 ][ 1 ]: (1998, 1852)  is not in pos2nid
ind is : _11
pos2nid keys is :  [(314, 1471), (888, 1885), (397, 1389), (1891, 1846), (268, 1363), (1934, 1787), (822, 1828), (690, 1597), (1918, 1866), (883, 1824), (675, 1703), (624, 1580), (552, 1679), (1903, 1783)]
pos2nid is :  dict_keys([(314, 1471), (888, 1885), (397, 1389), (1891, 1846), (268, 1363), (1934, 1787), (822, 1828), (690, 1597), (1918, 1866), (883, 1824), (675, 1703), (624, 1580), (552, 1679), (1903, 1783)])
< 2 ! >

3

After checking the code, I found that the problem occurs in TuringLaneValidation/dataloader.py, 168 line. It check whether the data of links is in pos2nid, but the some data of datasets does not meet this condition, so it will execute the exit() function to exit the current dataloader subprocess, but other dataloader subprocesses are stuck and do not continue training too.

if (links[2][j][-1][0], links[2][j][-1][1]) not in pos2nid:
    print('\n', '[ERROR] links[', 2,'][', j,']:', (links[2][j][-1][0], links[2][j][-1][1]),' is not in pos2nid')
    print('ind is :', ind)
    print('pos2nid keys is : ', list(pos2nid.keys()))
    print('pos2nid is : ', pos2nid.keys())
    print('< 2 ! >')
    exit()

代码截图2

The only change I made to the code was the links[].keys() is changed to list(links[].keys()), because I use Python 3. The dict.keys() method of Python 3 returns not a list, but a view object. Add the list () method to convert it into a list. No other code has been changed except print(). 代码截图3

My dataset is create by runing _create_training_data.py_, and there is no problem when training of LaneAndDirectionExtraction and TurningLaneExtration.

I don't know how to solve this problem. I want to training the dataset completely. I wonder there is a problem with the data of dataset or whether it is stop training on purpose?

Would you please help me solve this problem?

Thank you very much!

@songtaohe

ZiRanSlash commented 2 years ago

This problem is cause by dataset.

In _datasettraining folder, there is wrong data in "_link_11.json" and "link_14.json_".

It is easy to solve this problem, you can add this code in turningLaneValidation/dataloader.py in line 77 to skip loading wrong data:

if(ind in ['_11','_14']):
    continue

or you can change _split_all.json_ file, remove 1 from training and run _create_traingdata.py again