uclahs-cds / package-moPepGen

Multi-Omics Peptide Generator
https://uclahs-cds.github.io/package-moPepGen/
GNU General Public License v2.0
5 stars 1 forks source link

circRNA + noncoding variants reference something not found #270

Closed lydiayliu closed 2 years ago

lydiayliu commented 2 years ago

Was at 112000 transcritps, so close to finishing!! Still working with CPCG0259

I have no name!@89866c2e1a10:/$ moPepGen callVariant \                                                                                                                      
>     --input-variant /hot/users/yiyangliu/MoPepGen/Parser/CIRCexplorer3/TOPHAT/${c}_IP_quant.txt.1.3ff.gvf \                                                                                       
>         /hot/users/yiyangliu/MoPepGen/Parser/VEP/gencode/gsnp/${b} \                                                                                                                              
>         /hot/users/yiyangliu/MoPepGen/Parser/VEP/gencode/gindel/${b} \                                                                                                                            
>         /hot/users/yiyangliu/MoPepGen/Parser/VEP/gencode/somaticsniper/${b} \                                                                                                                     
>         /hot/users/yiyangliu/MoPepGen/Parser/VEP/gencode/pindel/${b} \                                                                                                                            
>     --index-dir /hot/users/yiyangliu/MoPepGen/Index/GRCh38-EBI-GENCODE34/ \                                                                                                                       
>     --output-fasta /hot/users/yiyangliu/MoPepGen/Variant/CIRCexplorer3/TOPHAT/circ_ssm/${c}.3f.fasta                                                                                              
[ 2021-12-03 01:35:25 ] moPepGen callVariant started                                                                                                                        
[ 2021-12-03 01:36:38 ] Variant file /hot/users/yiyangliu/MoPepGen/Parser/CIRCexplorer3/TOPHAT/CPCG0259_IP_quant.txt.1.3ff.gvf loaded.                                                              
[ 2021-12-03 01:42:53 ] Variant file /hot/users/yiyangliu/MoPepGen/Parser/VEP/gencode/gsnp/CPCG0259.gencode.tsv.gvf loaded.                                                                         
[ 2021-12-03 01:43:52 ] Variant file /hot/users/yiyangliu/MoPepGen/Parser/VEP/gencode/gindel/CPCG0259.gencode.tsv.gvf loaded.                                                                       
[ 2021-12-03 01:43:53 ] Variant file /hot/users/yiyangliu/MoPepGen/Parser/VEP/gencode/somaticsniper/CPCG0259.gencode.tsv.gvf loaded.                                                                
[ 2021-12-03 01:43:53 ] Variant file /hot/users/yiyangliu/MoPepGen/Parser/VEP/gencode/pindel/CPCG0259.gencode.tsv.gvf loaded.

[ 2021-12-03 19:21:36 ] 112000 transcripts processed.
Traceback (most recent call last):
  File "/usr/local/bin/moPepGen", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/__main__.py", line 77, in main
    args.func(args)
  File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/call_variant_peptide.py", line 118, in call_variant_peptide
    peptides = call_peptide_circ_rna(circ_rna, anno, genome,
  File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/call_variant_peptide.py", line 183, in call_peptide_circ_rna
    cgraph.fit_into_codons()
  File "/usr/local/lib/python3.8/site-packages/moPepGen/svgraph/ThreeFrameTVG.py", line 991, in fit_into_codons
    node = self.expand_alignments(cur)
  File "/usr/local/lib/python3.8/site-packages/moPepGen/svgraph/ThreeFrameTVG.py", line 945, in expand_alignments
    ref_node = start.get_reference_next()
  File "/usr/local/lib/python3.8/site-packages/moPepGen/svgraph/TVGNode.py", line 169, in get_reference_next
    raise ValueError('No reference edge was found.')
ValueError: No reference edge was found.
zhuchcn commented 2 years ago

Was the transcript ID not printed out?

lydiayliu commented 2 years ago

no sorry all i got is

[ 2021-12-03 19:21:36 ] 112000 transcripts processed.                                                                                                                       
Traceback (most recent call last):                                                    
  File "/usr/local/bin/moPepGen", line 8, in <module>   
zhuchcn commented 2 years ago

Did it run 18 hours??

lydiayliu commented 2 years ago

no i copied the top part from another issue cuz i was too lazy to fix the paths

sorry!!!!

[ 2021-12-03 16:33:36 ] moPepGen callVariant started                                                                                                                        
[ 2021-12-03 16:34:48 ] Variant file /data/Parser/CIRCexplorer3/TOPHAT/CPCG0259_IP_quant.txt.1.3ff.gvf loaded.                                                              
[ 2021-12-03 16:41:03 ] Variant file /data/Parser/VEP/gencode/gsnp/CPCG0259.gencode.tsv.gvf loaded.                                                                         
[ 2021-12-03 16:42:02 ] Variant file /data/Parser/VEP/gencode/gindel/CPCG0259.gencode.tsv.gvf loaded.
[ 2021-12-03 16:42:02 ] Variant file /data/Parser/VEP/gencode/somaticsniper/CPCG0259.gencode.tsv.gvf loaded.
[ 2021-12-03 16:42:02 ] Variant file /data/Parser/VEP/gencode/pindel/CPCG0259.gencode.tsv.gvf loaded.
[ 2021-12-03 16:42:26 ] Variant records sorted.
[ 2021-12-03 16:44:05 ] 1000 transcripts processed.
[ 2021-12-03 16:45:09 ] 2000 transcripts processed.
[ 2021-12-03 16:46:18 ] 3000 transcripts processed.
[ 2021-12-03 16:48:14 ] 4000 transcripts processed.

just 3 hours

zhuchcn commented 2 years ago

Haha that's funny. I know why the transcript ID isn't printed out, because it's circRNA. I'll look into it.

lydiayliu commented 2 years ago

waited three hours to get back to the point this died last time XD died again but at least we know where...

[ 2021-12-09 19:53:23 ] 110000 transcripts processed.
[ 2021-12-09 19:54:16 ] 111000 transcripts processed.
[ 2021-12-09 19:55:04 ] 112000 transcripts processed.
[ 2021-12-09 19:57:06 ] Exception raised from CIRC-ENST00000341594.9-E81-E82
Traceback (most recent call last):
  File "/usr/local/bin/moPepGen", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/__main__.py", line 77, in main
    args.func(args)
  File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/call_variant_peptide.py", line 120, in call_variant_peptide
    peptides = call_peptide_circ_rna(circ_rna, anno, genome,
  File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/call_variant_peptide.py", line 178, in call_peptide_circ_rna
    variant_records = variant_pool.filter_variants(gene_id, annotation, genome,
  File "/usr/local/lib/python3.8/site-packages/moPepGen/seqvar/VariantRecordPool.py", line 154, in filter_variants
    record_gene = anno.variant_coordinates_to_gene(record, gene_id)
  File "/usr/local/lib/python3.8/site-packages/moPepGen/gtf/GenomicAnnotation.py", line 274, in variant_coordinates_to_gene
    end_gene = self.genes[gene_id].location.start - end_genomic
TypeError: unsupported operand type(s) for -: 'ExactPosition' and 'NoneType'
lydiayliu commented 2 years ago

Are you still working on this one? was it fixed in #290?

lydiayliu commented 2 years ago

Something changed so that this runs, but output might be funny.

moPepGen-util downsampleReference \
    --tx-list ENST00000341594.9 \
    --output-dir /data/Index/ENST00000341594.9/ \
    --genome-fasta /reference/GRCh38.p13.genome.fa \
    --annotation-gtf /reference/gencode.v34.chr_patch_hapl_scaff.annotation.gtf \
    --proteome-fasta /reference/gencode.v34.pc_translations.fa \
    --translate-noncoding true

manually grep # and ENST00000341594.9 from each gvf

moPepGen callVariant \
    --input-variant /hot/users/yiyangliu/MoPepGen/Variant/CIRCexplorer3/TOPHAT/circ_ssm/CPCG0259_ENST00000341594.9/circ.gvf \
        /hot/users/yiyangliu/MoPepGen/Variant/CIRCexplorer3/TOPHAT/circ_ssm/CPCG0259_ENST00000341594.9/gsnp.gvf \
        /hot/users/yiyangliu/MoPepGen/Variant/CIRCexplorer3/TOPHAT/circ_ssm/CPCG0259_ENST00000341594.9/gindel.gvf \
        /hot/users/yiyangliu/MoPepGen/Variant/CIRCexplorer3/TOPHAT/circ_ssm/CPCG0259_ENST00000341594.9/somaticsniper.gvf \
        /hot/users/yiyangliu/MoPepGen/Variant/CIRCexplorer3/TOPHAT/circ_ssm/CPCG0259_ENST00000341594.9/pindel.gvf \
    --genome-fasta /hot/users/yiyangliu/MoPepGen/Index/ENST00000341594.9/genome.fasta \
    --annotation-gtf /hot/users/yiyangliu/MoPepGen/Index/ENST00000341594.9/annotation.gtf \
    --proteome-fasta /hot/users/yiyangliu/MoPepGen/Index/ENST00000341594.9/proteome.fasta \
    --output-fasta /hot/users/yiyangliu/MoPepGen/Variant/CIRCexplorer3/TOPHAT/circ_ssm/CPCG0259_ENST00000341594.9/output.fasta

Ran in a second.

zhuchcn commented 2 years ago

Are you still working on this one? was it fixed in #290?

I don't think so. I'm still working on #289

zhuchcn commented 2 years ago

This has been fixed in b21480c . Something did mess up so the variants were probably not incorporated properly. Now we have some variant peptides in the fasta! The branch is based on czhu-fix-fusion, so let's merge #289 first.

lydiayliu commented 2 years ago

im on this, testing b21480c with --noncanonical-transcripts to massively save time!

lydiayliu commented 2 years ago

this sample ran through!! gonna call it a day.

can you open a PR for b21480c so we can close this issue there?