maiziex / Aquila

Diploid personal genome assembly and comprehensive variant detection based on linked-reads
MIT License
20 stars 8 forks source link

about reference #1

Open GeorgeBGM opened 4 years ago

GeorgeBGM commented 4 years ago

Will the reference sequence be considered later from linear genome to graph genome?

maiziex commented 4 years ago

Thank you for your comment. This is an area that we are working on and hope to implement in a later version.

GeorgeBGM commented 4 years ago

Hi maiziex, Thanks for super-quick response! I got several error messages when I used Aquila_step2 to deal with my 10X linked-reads that aligned to chromosome 6, which is like the follows: image So it only generated minicontig file without contig file in _Assembly_Contigsfiles directory, and I found this error about spades-core on website that maybe caused by the low coverage (https://github.com/ibest/ARC/issues/22),Can you give me several advises, thanks. I am looking forward to your reply. Best, Du

maiziex commented 4 years ago

Hi Du, What's the coverage for your dataset? Can you also check the depth from the file "H5_for_molecules/median_depth_for_var.txt" ?
Please also let me know the total size of the folder "Local_Assembly_by_chunks/chr6_files_cutPBHC" and the total number of the fastq files in this local asembly folder for chr6?

Thanks Maizie

GeorgeBGM commented 4 years ago

Hi Maizie, Thanks for your reply! My data has serious amplification imbalance, the mean coverage is about 10X, but several region is above 70X.I didn't got satisfactory result using longranger software,so I want to try Aquila. The depth from the file "H5_for_molecules/median_depth_for_var.txt" is 9.0. The total size of the folder "Local_Assembly_by_chunks/chr6_files_cutPBHC" is 410M and the total number of the fastq files is 1582(hp1+hp2). Thanks, Du

maiziex commented 4 years ago

Hi Du, The depth seems really low for local assembly. I tried libraries with 20x - 30x before for Aquila, and it is fine. You actually could directly use minicontigs to call variants if you want to try. I will upload a test version for Aquila soon (later this week or early next week) to use 1000 Genome VCF as input, instead of individual VCF from FreeBayes. It could allow more reads to be extracted for local assembly. You may want to try it for chr6. Best, Maizie

GeorgeBGM commented 4 years ago

Hi Maizie, Thanks for your help! It's seems that my local depth of chromosome 6 is really low ,and I will attempt directly use minicontigs to call variants. And i want to confirm that If there is no small variants or there have large SVs in a region, will this area be excluded before local assembly?Besides,I am glad to run your new test version for Aquila.Looking forward to your update soon. Best, Du

distilledchild commented 1 year ago

@maiziex Hi, I have the exactly same errors. Bunch of error messages like bellow. In my case at chr1, median coverage was 90X, but I am not sure what to do. Could you give me some advices?

== Error == system call for: "['/tools/envs/aquila/lib/python3.7/site-packages/bin/SPAdes-3.13.0-Linux/bin/spades-core', '/assembly/aquila/output/SHR_1/Local_Assembly_by_chunks/chr1_files_cutPBHC/fastq_by_14587583_14587749_hp1_spades_assembly/K21/configs/config.info']" finished abnormally, err code: -6 0:00:01.828 20M / 3G INFO K-mer Index Building (kmer_index_builder.hpp : 314) Building perfect hash indices

maiziex commented 1 year ago

Hi, Can you check what size of this fastq file "fastq_by_14587583_14587749_hp1" under the folder "Local_Assembly_by_chunks/chr1_files_cutPBHC". One possibility is this file is too small because of the low coverage. You can skip it for local assembly.

You can try to install "spades" assembler and assemble this fastq by itself to see if you get the same error.

distilledchild commented 1 year ago

@maiziex Thank you for your fast reply! I also have a question. The genome I am using is from a rat, so it has 21(20 pairs and XY), so do I have to run this 1, 2, 3...20, and 23 (for X) ?

maiziex commented 1 year ago

Yes, that's right. You need to run it for each chromosome.

Best, Maizie

https://lab.vanderbilt.edu/maizie-zhou-lab/


From: a gopher @.> Sent: Friday, November 18, 2022 1:41 AM To: maiziex/Aquila @.> Cc: Zhou, Maizie @.>; Mention @.> Subject: Re: [maiziex/Aquila] about reference (#1)

@maiziexhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmaiziex&data=05%7C01%7Cmaizie.zhou%40vanderbilt.edu%7C9cca2b9cbd1642d0b50f08dac9385ba5%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C638043541161837423%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vy7TwZ9blOm0Ks3MTr9zCgQbOFioPt33uiKmkpunvWI%3D&reserved=0 Thank you for your fast reply! I also have a question. The genome I am using is from a rat, so it has 21(20 pairs and XY), so do I have to run this 1, 2, 3...20, and 23 (for X) ?

— Reply to this email directly, view it on GitHubhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmaiziex%2FAquila%2Fissues%2F1%23issuecomment-1319651812&data=05%7C01%7Cmaizie.zhou%40vanderbilt.edu%7C9cca2b9cbd1642d0b50f08dac9385ba5%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C638043541161837423%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=L8LX5gWIc8CUOMg5wADkB8Ie1cqgglB%2BrxaTb2wPhiQ%3D&reserved=0, or unsubscribehttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABMOIQ7P45GX4RJLLVEGAZLWI4XL7ANCNFSM4QTCZE6A&data=05%7C01%7Cmaizie.zhou%40vanderbilt.edu%7C9cca2b9cbd1642d0b50f08dac9385ba5%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C638043541161837423%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=yIUPC%2BwYixvu2PCV8COqIGMM%2FmuIKJpxAjfeDCnrO%2FI%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>