maiziex / Aquila

Diploid personal genome assembly and comprehensive variant detection based on linked-reads
MIT License
20 stars 8 forks source link

Can't untar reference files #5

Open mzwaig opened 1 year ago

mzwaig commented 1 year ago

Hi,

I'm trying to download the reference and uniqueness files to run Aquila but the files seem to be tar'ed HTML files which I can't unzip

Best, Melissa

(/lb/project/tools/conda/Aquila) Wed Mar 01 08:51:26 /lb/project/tools/Aquila $ tar xvf source.tar.gz
gzip: stdin: not in gzip format tar: Child returned status 1 tar: Error is not recoverable: exiting now (/lb/project/tools/conda/Aquila) Wed Mar 01 08:51:32 /lb/project/tools/Aquila $ file source.tar.gz
source.tar.gz: HTML document, ASCII text, with very long lines, with no line terminators (/lb/project/tools/conda/Aquila) Wed Mar 01 08:51:38 /lb/project/tools/Aquila $ file Uniqness_map.tar.gz Uniqness_map.tar.gz: HTML document, ASCII text, with very long lines, with no line terminators

maiziex commented 1 year ago

Thanks for sharing this error. Unfortunately, the previous web links were just expired. Please download the reference and uniqueness files from zenodo: https://doi.org/10.5281/zenodo.7689958 I will update GitHub for this.

maiziex commented 1 year ago

Thanks for sharing this error. Unfortunately, the previous web links were just expired. Please download the reference and uniqueness files from zenodo: https://doi.org/10.5281/zenodo.7689958

- Maizie (Xin) Zhou, Ph.D. Assistant Professor Biomedical Engineering, Computer Science, and Data Science Institute Vanderbilt University

5919 Stevenson Center

Nashville, TN 37235

Phone: 615-343-6843

https://lab.vanderbilt.edu/maizie-zhou-lab/https://lab.vanderbilt.edu/maizie-zhou-lab/ https://lab.vanderbilt.edu/maizie-zhou-lab/


From: mzwaig @.> Sent: Wednesday, March 1, 2023 7:58 AM To: maiziex/Aquila @.> Cc: Subscribed @.***> Subject: [maiziex/Aquila] Can't untar reference files (Issue #5)

Hi,

I'm trying to download the reference and uniqueness files to run Aquila but the files seem to be tar'ed HTML files which I can't unzip

Best, Melissa

(/lb/project/tools/conda/Aquila) Wed Mar 01 08:51:26 /lb/project/tools/Aquila $ tar xvf source.tar.gz gzip: stdin: not in gzip format tar: Child returned status 1 tar: Error is not recoverable: exiting now (/lb/project/tools/conda/Aquila) Wed Mar 01 08:51:32 /lb/project/tools/Aquila $ file source.tar.gz source.tar.gz: HTML document, ASCII text, with very long lines, with no line terminators (/lb/project/tools/conda/Aquila) Wed Mar 01 08:51:38 /lb/project/tools/Aquila $ file Uniqness_map.tar.gz Uniqness_map.tar.gz: HTML document, ASCII text, with very long lines, with no line terminators

— Reply to this email directly, view it on GitHubhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmaiziex%2FAquila%2Fissues%2F5&data=05%7C01%7Cmaizie.zhou%40vanderbilt.edu%7C8023df433f1645e312fd08db1a5cfb39%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C638132758878812464%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=NyOi8brOUzqJTPDK2y145zmlWXP33ffIEzQSfvJtiU8%3D&reserved=0, or unsubscribehttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABMOIQ46THTJUJUT4ZZ6MYLWZ5IW3ANCNFSM6AAAAAAVMCYEEY&data=05%7C01%7Cmaizie.zhou%40vanderbilt.edu%7C8023df433f1645e312fd08db1a5cfb39%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C638132758878812464%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Tja2Iytxy3aH6WNk%2BAvNYR6CLZ7UulC%2FhZQSC%2BvEED4%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.***>

mzwaig commented 1 year ago

Thank you!

I'm having a second issue where I get this message when I try running the first step. My SNP calling is done with GATK not FreeBytes (still through LongRanger), could that be causing the issue?

Traceback (most recent call last): File "/lb/project/tools/Aquila/bin/Run_h5_all_multithreads.py", line 62, in Cal_snp_ratio_vs_depth AO_idx = _format.index("AO") ValueError: 'AO' is not in list

maiziex commented 1 year ago

yes, sorry for the inconvenience. Aquila only accepts vcf file from freebayes right now.

mzwaig commented 1 year ago

Could I solve that by modifying the vcf parsing in Run_h5_all_multithreads.py or is the vcf used in other steps as well?

maiziex commented 1 year ago

yes, you can just modify the vcf parsing in Run_h5_all_multithreads.py for GATK vcf format, no other steps. If you search the readme for "We now have a new version for step1 to use 1000 Genomes VCF as the input VCF file (please check here), and Aquila will use common variants from 1000G to help partition linked-reads. In the later version, Aquila will use Graph Genome Reference to replace Conventional Linear Reference." I added a python script "Run_h5_all_multithreads_GenRef.py" in the bin folder to parse 1000 Genomes VCF file. You can do the same thing for GATK vcf.

mzwaig commented 1 year ago

Great. Thanks!

mzwaig commented 1 year ago

Hi,

I've modified step 1 to run with the GATK output but I'm getting another error which I've included below. This is the first error message I get so I'm not sure why the files in results_phased_probmodel aren't being generated.

Thanks, Melissa

Traceback (most recent call last): File "/lb/project/ioannisr/Melissa-abacus/tools/Aquila/bin/Cut_phaseblock_for_phased_h5_v4.0_highconf_v2.py", line 277, in <module> Cut_phaseblock_for_phased_h5(file_name,chr_num,out_file,block_len_use,block_threshold,output_dir,bed_file,phase_block_file,global_track,HC_breakpoint_file,"xin") File "/lb/project/ioannisr/Melissa-abacus/tools/Aquila/bin/Cut_phaseblock_for_phased_h5_v4.0_highconf_v2.py", line 99, in Cut_phaseblock_for_phased_h5 f = open(h5_phased_file,"r") FileNotFoundError: [Errno 2] No such file or directory: '/lb/project/ioannisr/NOBACKUP/Melissa-nobackup/Luigi-Gen3G/Aquila/1003C/results_phased_probmodel/chr1.phased_final' [bam_sort_core] merging from 620 files and 20 in-memory blocks... [E::idx_find_and_load] Could not retrieve index file for '/lb/project/ioannisr/NOBACKUP/Melissa-nobackup/Luigi-Gen3G/Aquila/1003C/sorted_bam/sorted_bam.bam'

maiziex commented 1 year ago

python Aquila/bin/Aquila_step0_sortbam.py --bam_file possorted_bam.bam --out_dir Assembly_results_S12878 --num_threads_for_samtools_sort 30

Can you run step0 first?

mzwaig commented 1 year ago

Hi,

It generated a sorted_bam.bam file but no index and I was unable to index it with samtools as well.

Thanks, Melissa

maiziex commented 1 year ago

that's weird. this step only uses "samtools sort" (https://github.com/maiziex/Aquila/blob/master/bin/Aquila_step0_sortbam.py), you may want to check your input bam file and make sure it is not truncated.