sheynkman-lab / Long-Read-Proteogenomics

A workflow for enhanced protein isoform detection through integration of long-read RNA-seq and mass spectrometry-based proteomics.
MIT License
37 stars 15 forks source link

Get empty results after make_pacbio_cds_gtf.py #171

Closed weidonggan closed 1 month ago

weidonggan commented 1 month ago

Hi, I am running in steps according to the main.nf file. When I run make_pacbio_cds_gtf.py, I get an empty result. And I found a difference between make_pacbio_cds_gtf.py's code in main.nf and modules. Do you know what could cause this problem?

This is my code and data. python make_pacbio_cds_gtf.py --sample_gtf jurkat_corrected.5degfilter.gff --agg_orfs jurkat_orf_refined.tsv --refined_orfs jurkat_best_orf.tsv --pb_gene pb_gene.tsv --output_cds jurkat_cds.gtf

jurkat_data.zip

efwatts commented 1 month ago

Hello, We actually recently discovered an error in the script make_pacbio_cds_gtf.py. Attached is the version of that script that should run properly in txt format, becuase of GitHub uploading constraints. I have had to use the Docker to get this module to run as well (through apptainer, formerly known as singlularity), but this is the command I've gotten to work:

apptainer exec pb-cds-gtf_latest.sif /bin/bash -c " \ python 07_make_pacbio_cds_gtf.py \ --sample_gtf filtered_jurkat_corrected.gtf \ --agg_orfs jurkat_orf_refined.tsv \ --refined_orfs jurkat_best_ORF.tsv \ --pb_gene pb_gene.tsv \ --output_cds jurkat_cds.gtf " 07_make_pacbio_cds_gtf.py.txt

weidonggan commented 1 month ago

Thanks, it worked.


发件人: Emily F. Watts @.> 发送时间: 2024年6月3日 20:34 收件人: sheynkman-lab/Long-Read-Proteogenomics @.> 抄送: weidonggan @.>; Author @.> 主题: Re: [sheynkman-lab/Long-Read-Proteogenomics] Get empty results after make_pacbio_cds_gtf.py (Issue #171)

Hello, We actually recently discovered an error in the script make_pacbio_cds_gtf.py. Attached is the version of that script that should run properly in txt format, becuase of GitHub uploading constraints. I have had to use the Docker to get this module to run as well (through apptainer, formerly known as singlularity), but this is the command I've gotten to work:

apptainer exec pb-cds-gtf_latest.sif /bin/bash -c " python 07_make_pacbio_cds_gtf.py --sample_gtf filtered_jurkat_corrected.gtf --agg_orfs jurkat_orf_refined.tsv --refined_orfs jurkat_best_ORF.tsv --pb_gene pb_gene.tsv --output_cds jurkat_cds.gtf " 07_make_pacbio_cds_gtf.py.txthttps://github.com/user-attachments/files/15533843/07_make_pacbio_cds_gtf.py.txt

― Reply to this email directly, view it on GitHubhttps://github.com/sheynkman-lab/Long-Read-Proteogenomics/issues/171#issuecomment-2145091287, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BI4YURILWP72VPHLYDZMZSDZFRPENAVCNFSM6AAAAABIV4XBSGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBVGA4TCMRYG4. You are receiving this because you authored the thread.Message ID: @.***>

efwatts commented 1 month ago

Great!

weidonggan commented 1 month ago

Excuse me, I have a new problem when running sqanti3_protein.py.


python sqanti3_protein.py jurkat.transcript_exons_only.gtf jurkat.cds_renamed_exon.gtf jurkat_best_orf.tsv gencode.transcript_exons_only.gtf gencode.cds_renamed_exon.gtf -d /full-length/test1 -p jurkat Rscript (R) version 4.3.2 (2023-10-31) Parsing Reference Transcriptome.... Parsing Isoforms.... **** Performing Classification of Isoforms.... Traceback (most recent call last): File "sqanti3_protein.py", line 1093, in indelsJunc=None) File "sqanti3_protein.py", line 780, in isoformClassification isoform_hit = transcriptsKnownSpliceSites(refs_1exon_by_chr, refs_exons_by_chr, start_ends_by_gene, rec, genome_dict, nPolyA=args.window) File "sqanti3_protein.py", line 364, in transcriptsKnownSpliceSites seqAdownTTS=seq_downTTS) File "sqanti3_protein.py", line 187, in init polyA_motif=polyA_motif, polyA_dist=polyA_dist) TypeError: init() got an unexpected keyword argument 'dist_cage'

Do you know what could cause this problem? [https://res.cdn.office.net/assets/mail/file-icon/png/zip_16x16.png]test_data.ziphttps://1drv.ms/u/c/36f75ce114fad70e/EZmm95HmWYlBjrDYjRx5q1EBGaNXP6Up-AY_bgxiu44BQw


发件人: Emily F. Watts @.> 发送时间: 2024年6月3日 21:07 收件人: sheynkman-lab/Long-Read-Proteogenomics @.> 抄送: weidonggan @.>; Author @.> 主题: Re: [sheynkman-lab/Long-Read-Proteogenomics] Get empty results after make_pacbio_cds_gtf.py (Issue #171)

Great!

― Reply to this email directly, view it on GitHubhttps://github.com/sheynkman-lab/Long-Read-Proteogenomics/issues/171#issuecomment-2145159787, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BI4YURKMGHE6JSAIZZSFM73ZFRTAVAVCNFSM6AAAAABIV4XBSGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBVGE2TSNZYG4. You are receiving this because you authored the thread.Message ID: @.***>

efwatts commented 1 month ago

Hi,

I think we've made modifications to this script, because those line errors don't match the lines in my script. Here's the sqanti protein script I've recently gotten to work successfully! The command you're using looks great, so hopefully this script works for you. 09_sqanti_protein.py.txt

weidonggan commented 1 month ago

Thank you for such a great tool that has taught me so much.


发件人: Emily F. Watts @.> 发送时间: 2024年6月4日 21:02 收件人: sheynkman-lab/Long-Read-Proteogenomics @.> 抄送: weidonggan @.>; Author @.> 主题: Re: [sheynkman-lab/Long-Read-Proteogenomics] Get empty results after make_pacbio_cds_gtf.py (Issue #171)

Hi,

I think we've made modifications to this script, because those line errors don't match the lines in my script. Here's the sqanti protein script I've recently gotten to work successfully! The command you're using looks great, so hopefully this script works for you. 09_sqanti_protein.py.txthttps://github.com/user-attachments/files/15551746/09_sqanti_protein.py.txt

― Reply to this email directly, view it on GitHubhttps://github.com/sheynkman-lab/Long-Read-Proteogenomics/issues/171#issuecomment-2147481455, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BI4YURISIW2CIN7HWT3QLGLZFW3F7AVCNFSM6AAAAABIV4XBSGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBXGQ4DCNBVGU. You are receiving this because you authored the thread.Message ID: @.***>