sjroth / ARTDeco

MIT License
15 stars 7 forks source link

Unmatched chrom.sizes and GTF #3

Closed maxchang closed 5 years ago

maxchang commented 5 years ago

I'm trying to preprocess some data using a GTF from the hg19 iGenomes package. There are some scaffolds present in the GTF that are not in the actual genome sequence used for alignments or the chrome. Can you use only sequences that are common between GTF and chrom.sizes?

Running preprocess mode...
GTF file exists...
Gene annotation files exist...
Inferring BAM file formats...
All BAM files are Paired-End and Single-stranded...
Creating intergenic BED files...
Traceback (most recent call last):
  File "/home/mwchang/gpfs/miniconda3/envs/ARTDeco/bin/ARTDeco", line 11, in <module>
    load_entry_point('ARTDeco==0.2', 'console_scripts', 'ARTDeco')()
  File "/home/mwchang/gpfs/dev/ARTDeco/ARTDeco/main.py", line 270, in main
    upstream_dist=args.read_in_dist)
  File "/home/mwchang/gpfs/dev/ARTDeco/ARTDeco/preprocess.py", line 324, in create_unstranded_read_in_df
    read_in = format_read_in_df(read_in,min_len,chrom_lengths)
  File "/home/mwchang/gpfs/dev/ARTDeco/ARTDeco/preprocess.py", line 144, in format_read_in_df
    df['Max Len'] = df.apply(lambda x: x[4] < chrom_lengths[x[1]],axis=1)
  File "/home/mwchang/gpfs/miniconda3/envs/ARTDeco/lib/python3.6/site-packages/pandas/core/frame.py", line 6487, in apply
    return op.get_result()
  File "/home/mwchang/gpfs/miniconda3/envs/ARTDeco/lib/python3.6/site-packages/pandas/core/apply.py", line 151, in get_result
    return self.apply_standard()
  File "/home/mwchang/gpfs/miniconda3/envs/ARTDeco/lib/python3.6/site-packages/pandas/core/apply.py", line 257, in apply_standard
    self.apply_series_generator()
  File "/home/mwchang/gpfs/miniconda3/envs/ARTDeco/lib/python3.6/site-packages/pandas/core/apply.py", line 286, in apply_series_generator
    results[i] = self.f(v)
  File "/home/mwchang/gpfs/dev/ARTDeco/ARTDeco/preprocess.py", line 144, in <lambda>
    df['Max Len'] = df.apply(lambda x: x[4] < chrom_lengths[x[1]],axis=1)
KeyError: ('chr1_GL456221_random', 'occurred at index 11241')
sjroth commented 5 years ago

Yes this seems to be a limitation. I will alter the README to reflect this as a limitation.