kishwarshafin / pepper

PEPPER-Margin-DeepVariant
MIT License
239 stars 42 forks source link

problem when running PEPPER locally #50

Closed huangnengCSU closed 3 years ago

huangnengCSU commented 3 years ago

Hi, I install PEPPER locally and run the command pepper_snp make_images run_inference and find_candidates step by step. In the second step run_inference, the program needs an input model path. I run pepper download_models to get the model, but I find that the models are not for pepper_snp. Finally, I find the trained model in your google cloud.&prefix=&forceOnObjectsSortingFiltering=false)

Best, Neng Huang

kishwarshafin commented 3 years ago

@huangnengCSU ,

Are you looking for the PEPPER models? You found the right place where they are located.

huangnengCSU commented 3 years ago

Not the PEPPER polishing model. I was looking for the model for pepper_snp run_inference. It took me some time to find these models. Maybe you could add the link in the README file. image

kishwarshafin commented 3 years ago

I see, thanks for the suggestion, I will update the readme in the next release. I will close this issue, please feel free to reopen if you have any other issues.

huangnengCSU commented 3 years ago

@kishwarshafin I met some problems when training pepper_snp model. The data is HG002 and the draft assembly is generated by Flye. Here is my workflow:

# raw reads: hg002_nodup_nobad.fastq
# Flye assembly: assembly.fasta
# reads-to-assembly alignment: reads2asm.sort.bam (minimap2 -ax map-ont assembly.fasta hg002.nodup.nobad.fastq -t 60 > reads2asm.sam)
# truth_hp1: HG002_GIABv332_truths_HG002_GIABv332_2_GRCh38_no_alt_hp1.fa
# truth_hp2: HG002_GIABv332_truths_HG002_GIABv332_2_GRCh38_no_alt_hp2.fa
# truth_hp1-to-assembly alignment: truth_h1.sorted.bam (minimap2 -ax asm5 assembly.fasta HG002_GIABv332_truths_HG002_GIABv332_2_GRCh38_no_alt_hp1.fa -t 40 > truth_h1.sam)
# truth_hp2-to-assembly alignment: truth_h2.sorted.bam (minimap2 -ax asm5 assembly.fasta HG002_GIABv332_truths_HG002_GIABv332_2_GRCh38_no_alt_hp2.fa -t 40 > truth_h2.sam)
pepper_snp_train make_train_images -b reads2asm.sort.bam -f assembly.fasta -tb1 truth_h1.sorted.bam -tb2 truth_h2.sorted.bam -o HG002_train_images -t 60

[04-27-2021 15:08:45] INFO: MAKE TRAIN IMAGE MODULE SELECTED
[04-27-2021 15:08:45] INFO: COMMON CONTIGS FOUND: ['contig_1', 'contig_2', 'contig_5', 'contig_6', 'contig_7', 'contig_9', 'contig_10', 'contig_11', 'contig_12', 'contig_13', 'contig_14', 'contig_17', 'contig_20', 'contig_21', 'contig_22', 'contig_23', 'contig_24', 'contig_25', 'contig_26', 'contig_27', 'contig_29', 'contig_30', 'contig_31', 'contig_33', 'contig_34', 'contig_35', 'contig_36', 'contig_37', 'contig_39', 'contig_40', 'contig_41', 'contig_42', 'contig_43', 'contig_44', 'contig_45', 'contig_46', 'contig_47', 'contig_48', 'contig_52', 'contig_53', 'contig_54', 'contig_55', 'contig_56', 'contig_58', 'contig_59', 'contig_60', 'contig_61', 'contig_62', 'contig_63', 'contig_64', 'contig_65', 'contig_66', 'contig_68', 'contig_69', 'contig_70', 'contig_73', 'contig_74', 'contig_75', 'contig_76', 'contig_77', 'contig_78', 'contig_81', 'contig_84', 'contig_86', 'contig_87', 'contig_88', 'contig_89', 'contig_91', 'contig_93', 'contig_94', 'contig_95', 'contig_96', 'contig_97', 'contig_100', 'contig_101', 'contig_103', 'contig_104', 'contig_105', 'contig_107', 'contig_108', 'contig_109', 'contig_110', 'contig_111', 'contig_112', 'contig_113', 'contig_115', 'contig_116', 'contig_117', 'contig_123', 'contig_124', 'contig_125', 'contig_126', 'contig_127', 'contig_129', 'contig_131', 'contig_132', 'contig_135', 'contig_137', 'contig_138', 'contig_139', 'contig_140', 'contig_141', 'contig_143', 'contig_144', 'contig_145', 'contig_146', 'contig_147', 'contig_148', 'contig_149', 'contig_150', 'contig_151', 'contig_153', 'contig_154', 'contig_155', 'contig_156', 'contig_158', 'contig_159', 'contig_160', 'contig_162', 'contig_163', 'contig_164', 'contig_165', 'contig_166', 'contig_167', 'contig_168', 'contig_169', 'contig_170', 'contig_172', 'contig_175', 'contig_178', 'contig_179', 'contig_180', 'contig_181', 'contig_192', 'contig_195', 'contig_197', 'contig_199', 'contig_202', 'contig_203', 'contig_204', 'contig_205', 'contig_206', 'contig_207', 'contig_208', 'contig_210', 'contig_211', 'contig_213', 'contig_214', 'contig_216', 'contig_217', 'contig_223', 'contig_224', 'contig_225', 'contig_226', 'contig_227', 'contig_229', 'contig_230', 'contig_231', 'contig_232', 'contig_234', 'contig_235', 'contig_236', 'contig_237', 'contig_241', 'contig_242', 'contig_243', 'contig_246', 'contig_248', 'contig_249', 'contig_250', 'contig_251', 'contig_254', 'contig_256', 'contig_257', 'contig_258', 'contig_259', 'contig_260', 'contig_261', 'contig_264', 'contig_267', 'contig_268', 'contig_270', 'contig_271', 'contig_274', 'contig_275', 'contig_277', 'contig_279', 'contig_281', 'contig_284', 'contig_285', 'contig_290', 'contig_293', 'contig_295', 'contig_296', 'contig_297', 'contig_298', 'contig_299', 'contig_302', 'contig_303', 'contig_304', 'contig_307', 'contig_309', 'contig_310', 'contig_313', 'contig_314', 'contig_316', 'contig_317', 'contig_318', 'contig_324', 'contig_327', 'contig_329', 'contig_336', 'contig_343', 'contig_346', 'contig_347', 'contig_349', 'contig_351', 'contig_358', 'contig_359', 'contig_361', 'contig_364', 'contig_369', 'contig_370', 'contig_372', 'contig_382', 'contig_387', 'contig_388', 'contig_391', 'contig_394', 'contig_397', 'contig_401', 'contig_403', 'contig_406', 'contig_409', 'contig_411', 'contig_412', 'contig_421', 'contig_423', 'contig_424', 'contig_425', 'contig_429', 'contig_435', 'contig_439', 'contig_443', 'contig_444', 'contig_449', 'contig_455', 'contig_458', 'contig_460', 'contig_470', 'contig_476', 'contig_485', 'contig_488', 'contig_489', 'contig_492', 'contig_493', 'contig_494', 'contig_505', 'contig_506', 'contig_508', 'contig_523', 'contig_527', 'contig_539', 'contig_545', 'contig_546', 'contig_547', 'contig_549', 'contig_558', 'contig_559', 'contig_563', 'contig_564', 'contig_566', 'contig_568', 'contig_569', 'contig_571', 'contig_572', 'contig_573', 'contig_577', 'contig_579', 'contig_581', 'contig_584', 'contig_585', 'contig_587', 'contig_588', 'contig_590', 'contig_593', 'contig_595', 'contig_597', 'contig_599', 'contig_600', 'contig_601', 'contig_602', 'contig_603', 'contig_604', 'contig_605', 'contig_606', 'contig_608', 'contig_609', 'contig_612', 'contig_613', 'contig_614', 'contig_616', 'contig_618', 'contig_624', 'contig_626', 'contig_627', 'contig_629', 'contig_630', 'contig_631', 'contig_632', 'contig_635', 'contig_640', 'contig_642', 'contig_643', 'contig_645', 'contig_654', 'contig_662', 'contig_664', 'contig_665', 'contig_670', 'contig_671', 'contig_672', 'contig_673', 'contig_674', 'contig_676', 'contig_679', 'contig_683', 'contig_685', 'contig_687', 'contig_694', 'contig_695', 'contig_696', 'contig_697', 'contig_698', 'contig_707', 'contig_708', 'contig_709', 'contig_710', 'contig_711', 'contig_712', 'contig_713', 'contig_717', 'contig_718', 'contig_720', 'contig_725', 'contig_726', 'contig_727', 'contig_728', 'contig_731', 'contig_733', 'contig_735', 'contig_740', 'contig_741', 'contig_742', 'contig_752', 'contig_755', 'contig_757', 'contig_758', 'contig_759', 'contig_760', 'contig_764', 'contig_765', 'contig_768', 'contig_769', 'contig_772', 'contig_774', 'contig_776', 'contig_777', 'contig_782', 'contig_786', 'contig_789', 'contig_790', 'contig_791', 'contig_792', 'contig_797', 'contig_798', 'contig_806', 'contig_807', 'contig_812', 'contig_817', 'contig_821', 'contig_823', 'contig_826', 'contig_827', 'contig_834', 'contig_835', 'contig_839', 'contig_840', 'contig_842', 'contig_845', 'contig_848', 'contig_853', 'contig_855', 'contig_858', 'contig_861', 'contig_863', 'contig_864', 'contig_865', 'contig_869', 'contig_872', 'contig_876', 'contig_878', 'contig_880', 'contig_885', 'contig_886', 'contig_887', 'contig_888', 'contig_892', 'contig_893', 'contig_894', 'contig_898', 'contig_900', 'contig_903', 'contig_904', 'contig_912', 'contig_915', 'contig_922', 'contig_923', 'contig_924', 'contig_928', 'contig_937', 'contig_939', 'contig_942', 'contig_943', 'contig_944', 'contig_946', 'contig_947', 'contig_951', 'contig_956', 'contig_958', 'contig_959', 'contig_960', 'contig_962', 'contig_963', 'contig_964', 'contig_966', 'contig_972', 'contig_974', 'contig_984', 'contig_986', 'contig_990', 'contig_991', 'contig_994', 'contig_1000', 'contig_1003', 'contig_1005', 'contig_1007', 'contig_1008', 'contig_1012', 'contig_1015', 'contig_1018', 'contig_1022', 'contig_1031', 'contig_1032', 'contig_1036', 'contig_1037', 'contig_1038', 'contig_1040', 'contig_1043', 'contig_1044', 'contig_1047', 'contig_1048', 'contig_1056', 'contig_1057', 'contig_1061', 'contig_1062', 'contig_1064', 'contig_1065', 'contig_1066', 'contig_1069', 'contig_1072', 'contig_1075', 'contig_1077', 'contig_1085', 'contig_1088', 'contig_1089', 'contig_1090', 'contig_1097', 'contig_1102', 'contig_1104', 'contig_1108', 'contig_1110', 'contig_1111', 'contig_1112', 'contig_1115', 'contig_1116', 'contig_1118', 'contig_1122', 'contig_1125', 'contig_1126', 'contig_1127', 'contig_1134', 'contig_1137', 'contig_1145', 'contig_1147', 'contig_1148', 'contig_1151', 'contig_1159', 'contig_1160', 'contig_1161', 'contig_1162', 'contig_1163', 'contig_1164', 'contig_1165', 'contig_1168', 'contig_1169', 'contig_1171', 'contig_1182', 'contig_1183', 'contig_1186', 'contig_1188', 'contig_1191', 'contig_1192', 'contig_1193', 'contig_1194', 'contig_1196', 'contig_1198', 'contig_1200', 'contig_1203', 'contig_1205', 'contig_1206', 'contig_1210', 'contig_1213', 'contig_1214', 'contig_1215', 'contig_1216', 'contig_1217', 'contig_1218', 'contig_1220', 'contig_1228', 'contig_1230', 'contig_1235', 'contig_1236', 'contig_1244', 'contig_1245', 'contig_1248', 'contig_1253', 'contig_1254', 'contig_1258', 'contig_1260', 'contig_1269', 'contig_1277', 'contig_1282', 'contig_1284', 'contig_1287', 'contig_1290', 'contig_1296', 'contig_1297', 'contig_1299', 'contig_1301', 'contig_1317', 'contig_1323', 'contig_1324', 'contig_1333', 'contig_1335', 'contig_1338', 'contig_1339', 'contig_1341', 'contig_1344', 'contig_1346', 'contig_1347', 'contig_1350', 'contig_1355', 'contig_1357', 'contig_1360', 'contig_1361', 'contig_1367', 'contig_1368', 'contig_1369', 'contig_1370', 'contig_1372', 'contig_1373', 'contig_1375', 'contig_1376', 'contig_1378', 'contig_1385', 'contig_1387', 'contig_1390', 'contig_1391', 'contig_1392', 'contig_1393', 'contig_1398', 'contig_1399', 'contig_1406', 'contig_1410', 'contig_1413', 'contig_1427', 'contig_1430', 'contig_1431', 'contig_1435', 'contig_1437', 'contig_1440', 'contig_1443', 'contig_1445', 'contig_1452', 'contig_1454', 'contig_1457', 'contig_1462', 'contig_1463', 'contig_1464', 'contig_1466', 'contig_1467', 'contig_1470', 'contig_1471', 'contig_1473', 'contig_1475', 'contig_1483', 'contig_1484', 'contig_1485', 'contig_1488', 'contig_1491', 'contig_1495', 'contig_1496', 'contig_1503', 'contig_1505', 'contig_1506', 'contig_1523', 'contig_1525', 'contig_1526', 'contig_1527', 'contig_1528', 'contig_1531', 'contig_1533', 'contig_1534', 'contig_1536', 'contig_1540', 'contig_1545', 'contig_1547', 'contig_1549', 'contig_1550', 'contig_1551', 'contig_1552', 'contig_1557', 'contig_1559', 'contig_1563', 'contig_1564', 'contig_1571', 'contig_1573', 'contig_1576', 'contig_1577', 'contig_1578', 'contig_1583', 'contig_1589', 'contig_1590', 'contig_1591', 'contig_1593', 'contig_1596', 'contig_1608', 'contig_1611', 'contig_1620', 'scaffold_4', 'scaffold_28', 'scaffold_32', 'scaffold_49', 'scaffold_50', 'scaffold_136', 'scaffold_191', 'scaffold_244', 'scaffold_245', 'scaffold_269', 'scaffold_286', 'scaffold_389', 'scaffold_459', 'scaffold_669', 'scaffold_800', 'scaffold_993', 'scaffold_996', 'scaffold_1446']
[04-27-2021 15:08:45] INFO: TOTAL CONTIGS: 668 TOTAL INTERVALS: 26920
[04-27-2021 15:08:45] STARTING THREAD: 0 FOR 449 INTERVALS
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
[04-27-2021 15:08:46] FINISHED IMAGE GENERATION
[04-27-2021 15:08:46] TOTAL ELAPSED TIME FOR IMAGE GENERATION: 0 Min 0 Sec
kishwarshafin commented 3 years ago

@huangnengCSU ,

Can you please run pepper --version and post the output here?

huangnengCSU commented 3 years ago

Hi, The pepper version is 0.4.1.

kishwarshafin commented 3 years ago

@huangnengCSU ,

It looks like the training sync was broken in v0.4.1 because of a small change in the datatype that was not synced. It should be fixed in the next push. However, you should be able to run the exact same command from within the docker if you want to try training a model now.

huangnengCSU commented 3 years ago

Thanks, I will try it soon.

huangnengCSU commented 3 years ago

Hi, I ran the training command within docker but I met the same problem. Here is my command.

sudo docker run --ipc=host -v "/homeb/data/PEPPER_HG002_training":"/homeb/data/PEPPER_HG002_training" -v "/homeb/data/PEPPER_HG002_training":"/homeb/data/PEPPER_HG002_training" kishwars/pepper_deepvariant:r0.4 pepper_snp_train make_train_images -b /homeb/data/PEPPER_HG002_training/reads2asm.sort.bam -f /homeb/data/PEPPER_HG002_training/assembly.fasta -tb1 /homeb/data/PEPPER_HG002_training/truth_h1.sorted.bam -tb2 /homeb/data/PEPPER_HG002_training/truth_h2.sorted.bam -o /homeb/data/PEPPER_HG002_training/HG002_train_images -t 60

The output information is as follows:

[05-10-2021 07:57:20] INFO: MAKE TRAIN IMAGE MODULE SELECTED
[05-10-2021 07:57:20] INFO: COMMON CONTIGS FOUND: ['contig_1', 'contig_2', 'contig_5', 'contig_6', 'contig_7', 'contig_9', 'contig_10', 'contig_11', 'contig_12', 'contig_13', 'contig_14', 'contig_17', 'contig_20', 'contig_21', 'contig_22', 'contig_23', 'contig_24', 'contig_25', 'contig_26', 'contig_27', 'contig_29', 'contig_30', 'contig_31', 'contig_33', 'contig_34', 'contig_35', 'contig_36', 'contig_37', 'contig_39', 'contig_40', 'contig_41', 'contig_42', 'contig_43', 'contig_44', 'contig_45', 'contig_46', 'contig_47', 'contig_48', 'contig_52', 'contig_53', 'contig_54', 'contig_55', 'contig_56', 'contig_58', 'contig_59', 'contig_60', 'contig_61', 'contig_62', 'contig_63', 'contig_64', 'contig_65', 'contig_66', 'contig_68', 'contig_69', 'contig_70', 'contig_73', 'contig_74', 'contig_75', 'contig_76', 'contig_77', 'contig_78', 'contig_81', 'contig_84', 'contig_86', 'contig_87', 'contig_88', 'contig_89', 'contig_91', 'contig_93', 'contig_94', 'contig_95', 'contig_96', 'contig_97', 'contig_100', 'contig_101', 'contig_103', 'contig_104', 'contig_105', 'contig_107', 'contig_108', 'contig_109', 'contig_110', 'contig_111', 'contig_112', 'contig_113', 'contig_115', 'contig_116', 'contig_117', 'contig_123', 'contig_124', 'contig_125', 'contig_126', 'contig_127', 'contig_129', 'contig_131', 'contig_132', 'contig_135', 'contig_137', 'contig_138', 'contig_139', 'contig_140', 'contig_141', 'contig_143', 'contig_144', 'contig_145', 'contig_146', 'contig_147', 'contig_148', 'contig_149', 'contig_150', 'contig_151', 'contig_153', 'contig_154', 'contig_155', 'contig_156', 'contig_158', 'contig_159', 'contig_160', 'contig_162', 'contig_163', 'contig_164', 'contig_165', 'contig_166', 'contig_167', 'contig_168', 'contig_169', 'contig_170', 'contig_172', 'contig_175', 'contig_178', 'contig_179', 'contig_180', 'contig_181', 'contig_192', 'contig_195', 'contig_197', 'contig_199', 'contig_202', 'contig_203', 'contig_204', 'contig_205', 'contig_206', 'contig_207', 'contig_208', 'contig_210', 'contig_211', 'contig_213', 'contig_214', 'contig_216', 'contig_217', 'contig_223', 'contig_224', 'contig_225', 'contig_226', 'contig_227', 'contig_229', 'contig_230', 'contig_231', 'contig_232', 'contig_234', 'contig_235', 'contig_236', 'contig_237', 'contig_241', 'contig_242', 'contig_243', 'contig_246', 'contig_248', 'contig_249', 'contig_250', 'contig_251', 'contig_254', 'contig_256', 'contig_257', 'contig_258', 'contig_259', 'contig_260', 'contig_261', 'contig_264', 'contig_267', 'contig_268', 'contig_270', 'contig_271', 'contig_274', 'contig_275', 'contig_277', 'contig_279', 'contig_281', 'contig_284', 'contig_285', 'contig_290', 'contig_293', 'contig_295', 'contig_296', 'contig_297', 'contig_298', 'contig_299', 'contig_302', 'contig_303', 'contig_304', 'contig_307', 'contig_309', 'contig_310', 'contig_313', 'contig_314', 'contig_316', 'contig_317', 'contig_318', 'contig_324', 'contig_327', 'contig_329', 'contig_336', 'contig_343', 'contig_346', 'contig_347', 'contig_349', 'contig_351', 'contig_358', 'contig_359', 'contig_361', 'contig_364', 'contig_369', 'contig_370', 'contig_372', 'contig_382', 'contig_387', 'contig_388', 'contig_391', 'contig_394', 'contig_397', 'contig_401', 'contig_403', 'contig_406', 'contig_409', 'contig_411', 'contig_412', 'contig_421', 'contig_423', 'contig_424', 'contig_425', 'contig_429', 'contig_435', 'contig_439', 'contig_443', 'contig_444', 'contig_449', 'contig_455', 'contig_458', 'contig_460', 'contig_470', 'contig_476', 'contig_485', 'contig_488', 'contig_489', 'contig_492', 'contig_493', 'contig_494', 'contig_505', 'contig_506', 'contig_508', 'contig_523', 'contig_527', 'contig_539', 'contig_545', 'contig_546', 'contig_547', 'contig_549', 'contig_558', 'contig_559', 'contig_563', 'contig_564', 'contig_566', 'contig_568', 'contig_569', 'contig_571', 'contig_572', 'contig_573', 'contig_577', 'contig_579', 'contig_581', 'contig_584', 'contig_585', 'contig_587', 'contig_588', 'contig_590', 'contig_593', 'contig_595', 'contig_597', 'contig_599', 'contig_600', 'contig_601', 'contig_602', 'contig_603', 'contig_604', 'contig_605', 'contig_606', 'contig_608', 'contig_609', 'contig_612', 'contig_613', 'contig_614', 'contig_616', 'contig_618', 'contig_624', 'contig_626', 'contig_627', 'contig_629', 'contig_630', 'contig_631', 'contig_632', 'contig_635', 'contig_640', 'contig_642', 'contig_643', 'contig_645', 'contig_654', 'contig_662', 'contig_664', 'contig_665', 'contig_670', 'contig_671', 'contig_672', 'contig_673', 'contig_674', 'contig_676', 'contig_679', 'contig_683', 'contig_685', 'contig_687', 'contig_694', 'contig_695', 'contig_696', 'contig_697', 'contig_698', 'contig_707', 'contig_708', 'contig_709', 'contig_710', 'contig_711', 'contig_712', 'contig_713', 'contig_717', 'contig_718', 'contig_720', 'contig_725', 'contig_726', 'contig_727', 'contig_728', 'contig_731', 'contig_733', 'contig_735', 'contig_740', 'contig_741', 'contig_742', 'contig_752', 'contig_755', 'contig_757', 'contig_758', 'contig_759', 'contig_760', 'contig_764', 'contig_765', 'contig_768', 'contig_769', 'contig_772', 'contig_774', 'contig_776', 'contig_777', 'contig_782', 'contig_786', 'contig_789', 'contig_790', 'contig_791', 'contig_792', 'contig_797', 'contig_798', 'contig_806', 'contig_807', 'contig_812', 'contig_817', 'contig_821', 'contig_823', 'contig_826', 'contig_827', 'contig_834', 'contig_835', 'contig_839', 'contig_840', 'contig_842', 'contig_845', 'contig_848', 'contig_853', 'contig_855', 'contig_858', 'contig_861', 'contig_863', 'contig_864', 'contig_865', 'contig_869', 'contig_872', 'contig_876', 'contig_878', 'contig_880', 'contig_885', 'contig_886', 'contig_887', 'contig_888', 'contig_892', 'contig_893', 'contig_894', 'contig_898', 'contig_900', 'contig_903', 'contig_904', 'contig_912', 'contig_915', 'contig_922', 'contig_923', 'contig_924', 'contig_928', 'contig_937', 'contig_939', 'contig_942', 'contig_943', 'contig_944', 'contig_946', 'contig_947', 'contig_951', 'contig_956', 'contig_958', 'contig_959', 'contig_960', 'contig_962', 'contig_963', 'contig_964', 'contig_966', 'contig_972', 'contig_974', 'contig_984', 'contig_986', 'contig_990', 'contig_991', 'contig_994', 'contig_1000', 'contig_1003', 'contig_1005', 'contig_1007', 'contig_1008', 'contig_1012', 'contig_1015', 'contig_1018', 'contig_1022', 'contig_1031', 'contig_1032', 'contig_1036', 'contig_1037', 'contig_1038', 'contig_1040', 'contig_1043', 'contig_1044', 'contig_1047', 'contig_1048', 'contig_1056', 'contig_1057', 'contig_1061', 'contig_1062', 'contig_1064', 'contig_1065', 'contig_1066', 'contig_1069', 'contig_1072', 'contig_1075', 'contig_1077', 'contig_1085', 'contig_1088', 'contig_1089', 'contig_1090', 'contig_1097', 'contig_1102', 'contig_1104', 'contig_1108', 'contig_1110', 'contig_1111', 'contig_1112', 'contig_1115', 'contig_1116', 'contig_1118', 'contig_1122', 'contig_1125', 'contig_1126', 'contig_1127', 'contig_1134', 'contig_1137', 'contig_1145', 'contig_1147', 'contig_1148', 'contig_1151', 'contig_1159', 'contig_1160', 'contig_1161', 'contig_1162', 'contig_1163', 'contig_1164', 'contig_1165', 'contig_1168', 'contig_1169', 'contig_1171', 'contig_1182', 'contig_1183', 'contig_1186', 'contig_1188', 'contig_1191', 'contig_1192', 'contig_1193', 'contig_1194', 'contig_1196', 'contig_1198', 'contig_1200', 'contig_1203', 'contig_1205', 'contig_1206', 'contig_1210', 'contig_1213', 'contig_1214', 'contig_1215', 'contig_1216', 'contig_1217', 'contig_1218', 'contig_1220', 'contig_1228', 'contig_1230', 'contig_1235', 'contig_1236', 'contig_1244', 'contig_1245', 'contig_1248', 'contig_1253', 'contig_1254', 'contig_1258', 'contig_1260', 'contig_1269', 'contig_1277', 'contig_1282', 'contig_1284', 'contig_1287', 'contig_1290', 'contig_1296', 'contig_1297', 'contig_1299', 'contig_1301', 'contig_1317', 'contig_1323', 'contig_1324', 'contig_1333', 'contig_1335', 'contig_1338', 'contig_1339', 'contig_1341', 'contig_1344', 'contig_1346', 'contig_1347', 'contig_1350', 'contig_1355', 'contig_1357', 'contig_1360', 'contig_1361', 'contig_1367', 'contig_1368', 'contig_1369', 'contig_1370', 'contig_1372', 'contig_1373', 'contig_1375', 'contig_1376', 'contig_1378', 'contig_1385', 'contig_1387', 'contig_1390', 'contig_1391', 'contig_1392', 'contig_1393', 'contig_1398', 'contig_1399', 'contig_1406', 'contig_1410', 'contig_1413', 'contig_1427', 'contig_1430', 'contig_1431', 'contig_1435', 'contig_1437', 'contig_1440', 'contig_1443', 'contig_1445', 'contig_1452', 'contig_1454', 'contig_1457', 'contig_1462', 'contig_1463', 'contig_1464', 'contig_1466', 'contig_1467', 'contig_1470', 'contig_1471', 'contig_1473', 'contig_1475', 'contig_1483', 'contig_1484', 'contig_1485', 'contig_1488', 'contig_1491', 'contig_1495', 'contig_1496', 'contig_1503', 'contig_1505', 'contig_1506', 'contig_1523', 'contig_1525', 'contig_1526', 'contig_1527', 'contig_1528', 'contig_1531', 'contig_1533', 'contig_1534', 'contig_1536', 'contig_1540', 'contig_1545', 'contig_1547', 'contig_1549', 'contig_1550', 'contig_1551', 'contig_1552', 'contig_1557', 'contig_1559', 'contig_1563', 'contig_1564', 'contig_1571', 'contig_1573', 'contig_1576', 'contig_1577', 'contig_1578', 'contig_1583', 'contig_1589', 'contig_1590', 'contig_1591', 'contig_1593', 'contig_1596', 'contig_1608', 'contig_1611', 'contig_1620', 'scaffold_4', 'scaffold_28', 'scaffold_32', 'scaffold_49', 'scaffold_50', 'scaffold_136', 'scaffold_191', 'scaffold_244', 'scaffold_245', 'scaffold_269', 'scaffold_286', 'scaffold_389', 'scaffold_459', 'scaffold_669', 'scaffold_800', 'scaffold_993', 'scaffold_996', 'scaffold_1446']
[05-10-2021 07:57:20] INFO: TOTAL CONTIGS: 668 TOTAL INTERVALS: 26920
[05-10-2021 07:57:20] STARTING THREAD: 0 FOR 449 INTERVALS
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
ERROR: 'NoneType' object has no attribute 'keys'
[05-10-2021 07:57:21] FINISHED IMAGE GENERATION
[05-10-2021 07:57:21] TOTAL ELAPSED TIME FOR IMAGE GENERATION: 0 Min 0 Sec
kishwarshafin commented 3 years ago

@huangnengCSU ,

Is it possible for you to share the data? I think there's an exception in the codebase but it's not happening with the data I have locally.

huangnengCSU commented 3 years ago

ONT raw reads: HG002(https://github.com/human-pangenomics/HG002_Data_Freeze_v1.0) Assembly: assembled by Flye (2.6-release) Truth hp1:HG002_GIABv332_truths_HG002_GIABv332_2_GRCh38_no_alt_hp1.fa (kishwar-helen/HG002_GIABv332_truths/) Truth hp2:HG002_GIABv332_truths_HG002_GIABv332_2_GRCh38_no_alt_hp2.fa (kishwar-helen/HG002_GIABv332_truths/) Aligner:minimap2(2.17-r974-dirty)

step1: mapping raw reads to draft assembly step2: mapping truth hp1 to draft assembly step3: mapping truth hp2 to draft assembly step4: make_train_images

pepper_snp_train make_train_images(failed) pepper_snp make_images(successed)

huangnengCSU commented 3 years ago

Hi, When I deg into the codes, I found the previous problem is caused by the missing of region_bed in the function get_chromosome_list. So I changed the input data of training workflow as follows:

ONT raw reads: HG002(https://github.com/human-pangenomics/HG002_Data_Freeze_v1.0) Assembly: GRCh38 Truth hp1:HG002_GIABv332_truths_HG002_GIABv332_2_GRCh38_no_alt_hp1.fa (kishwar-helen/HG002_GIABv332_truths/) Truth hp2:HG002_GIABv332_truths_HG002_GIABv332_2_GRCh38_no_alt_hp2.fa (kishwar-helen/HG002_GIABv332_truths/) Region_bed: HG002_SVs_Tier1_v0.6.bed(https://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/release/AshkenazimTrio/HG002_NA24385_son/NIST_SV_v0.6/) Aligner:minimap2(2.17-r974-dirty) The command is as follows:

pepper_snp_train make_train_images -b reads2ref.sort.bam -f GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fna -tb1 truth_hp1.sort.bam -tb2 truth_hp2.sort.bam -o HG002_train_images -t 60 -rb HG002_SVs_Tier1_v0.6.bed

The command no longer gives an error message. But there is another problem, the output hdf file is null.

05-13-2021 16:22:58] INFO: MAKE TRAIN IMAGE MODULE SELECTED
[05-13-2021 16:22:58] INFO: COMMON CONTIGS FOUND: ['chr1', 'chr2', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chr20', 'chr21', 'chr22', 'chrM', 'chrX', 'chrY']
[05-13-2021 16:22:58] INFO: TOTAL CONTIGS: 25 TOTAL INTERVALS: 30895
[05-13-2021 16:22:58] STARTING THREAD: 0 FOR 515 INTERVALS
[05-13-2021 16:22:59] INFO:  10/515 COMPLETE (1%) [ELAPSED TIME: 0 Min 0 Sec]
[05-13-2021 16:22:59] INFO:  20/515 COMPLETE (3%) [ELAPSED TIME: 0 Min 1 Sec]
[05-13-2021 16:23:00] INFO:  30/515 COMPLETE (5%) [ELAPSED TIME: 0 Min 2 Sec]
[05-13-2021 16:23:01] INFO:  40/515 COMPLETE (7%) [ELAPSED TIME: 0 Min 2 Sec]
[05-13-2021 16:23:01] INFO:  50/515 COMPLETE (9%) [ELAPSED TIME: 0 Min 3 Sec]
[05-13-2021 16:23:02] INFO:  60/515 COMPLETE (11%) [ELAPSED TIME: 0 Min 4 Sec]
[05-13-2021 16:23:03] INFO:  70/515 COMPLETE (13%) [ELAPSED TIME: 0 Min 5 Sec]
[05-13-2021 16:23:04] INFO:  80/515 COMPLETE (15%) [ELAPSED TIME: 0 Min 6 Sec]
[05-13-2021 16:23:05] INFO:  90/515 COMPLETE (17%) [ELAPSED TIME: 0 Min 6 Sec]
[05-13-2021 16:23:06] INFO:  100/515 COMPLETE (19%) [ELAPSED TIME: 0 Min 7 Sec]
[05-13-2021 16:23:06] INFO:  110/515 COMPLETE (21%) [ELAPSED TIME: 0 Min 8 Sec]
[05-13-2021 16:23:07] INFO:  120/515 COMPLETE (23%) [ELAPSED TIME: 0 Min 8 Sec]
[05-13-2021 16:23:07] INFO:  130/515 COMPLETE (25%) [ELAPSED TIME: 0 Min 9 Sec]
[05-13-2021 16:23:08] INFO:  140/515 COMPLETE (27%) [ELAPSED TIME: 0 Min 9 Sec]
[05-13-2021 16:23:09] INFO:  150/515 COMPLETE (29%) [ELAPSED TIME: 0 Min 10 Sec]
[05-13-2021 16:23:10] INFO:  160/515 COMPLETE (31%) [ELAPSED TIME: 0 Min 11 Sec]
[05-13-2021 16:23:10] INFO:  170/515 COMPLETE (33%) [ELAPSED TIME: 0 Min 12 Sec]
[05-13-2021 16:23:11] INFO:  180/515 COMPLETE (34%) [ELAPSED TIME: 0 Min 13 Sec]
[05-13-2021 16:23:12] INFO:  190/515 COMPLETE (36%) [ELAPSED TIME: 0 Min 13 Sec]
[05-13-2021 16:23:13] INFO:  200/515 COMPLETE (38%) [ELAPSED TIME: 0 Min 14 Sec]
[05-13-2021 16:23:14] INFO:  210/515 COMPLETE (40%) [ELAPSED TIME: 0 Min 15 Sec]
[05-13-2021 16:23:14] INFO:  220/515 COMPLETE (42%) [ELAPSED TIME: 0 Min 16 Sec]
[05-13-2021 16:23:15] INFO:  230/515 COMPLETE (44%) [ELAPSED TIME: 0 Min 16 Sec]
[05-13-2021 16:23:16] INFO:  240/515 COMPLETE (46%) [ELAPSED TIME: 0 Min 17 Sec]
[05-13-2021 16:23:16] INFO:  250/515 COMPLETE (48%) [ELAPSED TIME: 0 Min 18 Sec]
[05-13-2021 16:23:17] INFO:  260/515 COMPLETE (50%) [ELAPSED TIME: 0 Min 19 Sec]
[05-13-2021 16:23:18] INFO:  270/515 COMPLETE (52%) [ELAPSED TIME: 0 Min 19 Sec]
[05-13-2021 16:23:19] INFO:  280/515 COMPLETE (54%) [ELAPSED TIME: 0 Min 20 Sec]
[05-13-2021 16:23:20] INFO:  290/515 COMPLETE (56%) [ELAPSED TIME: 0 Min 21 Sec]
[05-13-2021 16:23:20] INFO:  300/515 COMPLETE (58%) [ELAPSED TIME: 0 Min 22 Sec]
[05-13-2021 16:23:21] INFO:  310/515 COMPLETE (60%) [ELAPSED TIME: 0 Min 23 Sec]
[05-13-2021 16:23:22] INFO:  320/515 COMPLETE (62%) [ELAPSED TIME: 0 Min 23 Sec]
[05-13-2021 16:23:23] INFO:  330/515 COMPLETE (64%) [ELAPSED TIME: 0 Min 24 Sec]
[05-13-2021 16:23:23] INFO:  340/515 COMPLETE (66%) [ELAPSED TIME: 0 Min 25 Sec]
[05-13-2021 16:23:24] INFO:  350/515 COMPLETE (67%) [ELAPSED TIME: 0 Min 26 Sec]
[05-13-2021 16:23:25] INFO:  360/515 COMPLETE (69%) [ELAPSED TIME: 0 Min 27 Sec]
[05-13-2021 16:23:26] INFO:  370/515 COMPLETE (71%) [ELAPSED TIME: 0 Min 27 Sec]
[05-13-2021 16:23:27] INFO:  380/515 COMPLETE (73%) [ELAPSED TIME: 0 Min 28 Sec]
[05-13-2021 16:23:28] INFO:  390/515 COMPLETE (75%) [ELAPSED TIME: 0 Min 29 Sec]
[05-13-2021 16:23:28] INFO:  400/515 COMPLETE (77%) [ELAPSED TIME: 0 Min 30 Sec]
[05-13-2021 16:23:29] INFO:  410/515 COMPLETE (79%) [ELAPSED TIME: 0 Min 31 Sec]
[05-13-2021 16:23:30] INFO:  420/515 COMPLETE (81%) [ELAPSED TIME: 0 Min 31 Sec]
[05-13-2021 16:23:31] INFO:  430/515 COMPLETE (83%) [ELAPSED TIME: 0 Min 32 Sec]
[05-13-2021 16:23:31] INFO:  440/515 COMPLETE (85%) [ELAPSED TIME: 0 Min 33 Sec]
[05-13-2021 16:23:32] INFO:  450/515 COMPLETE (87%) [ELAPSED TIME: 0 Min 34 Sec]
[05-13-2021 16:23:33] INFO:  460/515 COMPLETE (89%) [ELAPSED TIME: 0 Min 35 Sec]
[05-13-2021 16:23:34] INFO:  470/515 COMPLETE (91%) [ELAPSED TIME: 0 Min 35 Sec]
[05-13-2021 16:23:35] INFO:  480/515 COMPLETE (93%) [ELAPSED TIME: 0 Min 36 Sec]
[05-13-2021 16:23:35] INFO:  490/515 COMPLETE (95%) [ELAPSED TIME: 0 Min 37 Sec]
[05-13-2021 16:23:36] INFO:  500/515 COMPLETE (97%) [ELAPSED TIME: 0 Min 37 Sec]
[05-13-2021 16:23:36] INFO:  510/515 COMPLETE (99%) [ELAPSED TIME: 0 Min 38 Sec]
[05-13-2021 16:23:37] THREAD 0 FINISHED SUCCESSFULLY.
[05-13-2021 16:23:41] FINISHED IMAGE GENERATION
[05-13-2021 16:23:41] TOTAL ELAPSED TIME FOR IMAGE GENERATION: 0 Min 43 Sec
kishwarshafin commented 3 years ago

@huangnengCSU ,

Thank you for digging into the codebase. Yes, for the training image generation, a region bed file that has the high-confidence region, is a mandatory parameter. If you are planning to train on an assembly, you need to lift-over the GIAB high-confidence regions to the assembly and create a bed file. That bed file will be used to generate the examples for training.

huangnengCSU commented 3 years ago

I have provided a bed file from GIAB (HG002_NA24385_son/NIST_SV_v0.6/HG002_SVs_Tier1_v0.6.bed), but the output image HDF files are all empty. And there is no error or warning information. image

kishwarshafin commented 3 years ago

@huangnengCSU these are regions with structural variants, the truth will be collapsed around these regions. As you are using GIAB v3.3.2, you should use : ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/release/AshkenazimTrio/HG002_NA24385_son/NISTv3.2.2/HG002_GIAB_highconf_IllFB-IllGATKHC-CG-Ion-Solid_CHROM1-22_v3.2.2_highconf.bed for you -rb parameter.

kishwarshafin commented 3 years ago

Also, the topic of this issue has diverged quite a bit. If you still face any issue with the new bed file, can you please open a new issue with an appropriate title? Closing this for now.