I'm trying to preprocess some data using a GTF from the hg19 iGenomes package. There are some scaffolds present in the GTF that are not in the actual genome sequence used for alignments or the chrome. Can you use only sequences that are common between GTF and chrom.sizes?
Running preprocess mode...
GTF file exists...
Gene annotation files exist...
Inferring BAM file formats...
All BAM files are Paired-End and Single-stranded...
Creating intergenic BED files...
Traceback (most recent call last):
File "/home/mwchang/gpfs/miniconda3/envs/ARTDeco/bin/ARTDeco", line 11, in <module>
load_entry_point('ARTDeco==0.2', 'console_scripts', 'ARTDeco')()
File "/home/mwchang/gpfs/dev/ARTDeco/ARTDeco/main.py", line 270, in main
upstream_dist=args.read_in_dist)
File "/home/mwchang/gpfs/dev/ARTDeco/ARTDeco/preprocess.py", line 324, in create_unstranded_read_in_df
read_in = format_read_in_df(read_in,min_len,chrom_lengths)
File "/home/mwchang/gpfs/dev/ARTDeco/ARTDeco/preprocess.py", line 144, in format_read_in_df
df['Max Len'] = df.apply(lambda x: x[4] < chrom_lengths[x[1]],axis=1)
File "/home/mwchang/gpfs/miniconda3/envs/ARTDeco/lib/python3.6/site-packages/pandas/core/frame.py", line 6487, in apply
return op.get_result()
File "/home/mwchang/gpfs/miniconda3/envs/ARTDeco/lib/python3.6/site-packages/pandas/core/apply.py", line 151, in get_result
return self.apply_standard()
File "/home/mwchang/gpfs/miniconda3/envs/ARTDeco/lib/python3.6/site-packages/pandas/core/apply.py", line 257, in apply_standard
self.apply_series_generator()
File "/home/mwchang/gpfs/miniconda3/envs/ARTDeco/lib/python3.6/site-packages/pandas/core/apply.py", line 286, in apply_series_generator
results[i] = self.f(v)
File "/home/mwchang/gpfs/dev/ARTDeco/ARTDeco/preprocess.py", line 144, in <lambda>
df['Max Len'] = df.apply(lambda x: x[4] < chrom_lengths[x[1]],axis=1)
KeyError: ('chr1_GL456221_random', 'occurred at index 11241')
I'm trying to preprocess some data using a GTF from the hg19 iGenomes package. There are some scaffolds present in the GTF that are not in the actual genome sequence used for alignments or the chrome. Can you use only sequences that are common between GTF and chrom.sizes?