iobio / vcf.iobio.io

MIT License
27 stars 11 forks source link

Non-Model Organism Visualization #62

Open C-Pauli opened 5 years ago

C-Pauli commented 5 years ago

Is there a way to load a non-model organism as the reference genome?

AlistairNWard commented 5 years ago

We currently only support the human genome and one build of the mouse genome. Our plan was to expand the available reference genomes based on user request. Are there any particular non-model organism reference genomes you are looking to use?

Alistair Ward, PhD Co-founder | President | COO Frameshift Genomics Inc.

Director, Research and Science Eccles Institute of Human Genetics University of Utah School of Medicine

On Fri, Jan 18, 2019 at 12:51 PM Christopher Pauli notifications@github.com wrote:

Is there a way to load a non-model organism as the reference genome?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/iobio/vcf.iobio.io/issues/62, or mute the thread https://github.com/notifications/unsubscribe-auth/AAf7ZzPql_bU2DZOeOZbt-s26GoHzMKWks5vEgmkgaJpZM4aIXUk .

C-Pauli commented 5 years ago

Hi Alistair,

Thanks for getting back to me so quickly, and I'm excited that the platform could be expanded to other genomes.

I currently work with Cannabis genomics which is close to having a complete closed assembly. We currently have two linkage maps available, placing large genomic regions on the 10 diploid chromosomes of Cannabis, as well as three WGS strains that have a complete representative sequence of the genome; however, the best assembly is still in roughly 500 contigs.

Could this genome at it's current state be including in the vcf.iobio platform, or will we need to wait until we finish closing the genome and the chromosomes are fully assembled?

Thanks in advance for your time and help, Christopher Pauli

AlistairNWard commented 5 years ago

Hey Christopher,

It's certainly something that we can look into. The biggest problem is that with so many contigs, the contig selection would be difficult. The wheel in the top left that lets you select would likely be unusable, and the Variant Density chart at the top right would again be broken into so many reference sequences that it might be a problem. One other question would be whether the different contigs have similar variant attributes to each other. vcf.iobio works by sampling variants across the vcf file, so only works if sampling would quickly asymptote to reasonable values.

All that being said, we can take a look into it. Do you have any files that you could share? Specifically a reference fasta and a vcf?

It will definitely be easier with a fully assembled genome, so no promises, but we might as well try!!

Alistair Ward, PhD Co-founder | President | COO Frameshift Genomics Inc.

Director, Research and Science Eccles Institute of Human Genetics University of Utah School of Medicine

On Mon, Jan 21, 2019 at 8:24 PM Christopher Pauli notifications@github.com wrote:

Hi Alistair,

Thanks for getting back to me so quickly, and I'm excited that the platform could be expanded to other genomes.

I currently work with Cannabis genomics which is close to having a complete closed assembly. We currently have two linkage maps available, placing large genomic regions on the 10 diploid chromosomes of Cannabis, as well as three WGS strains that have a complete representative sequence of the genome; however, the best assembly is still in roughly 500 contigs.

Could this genome at it's current state be including in the vcf.iobio platform, or will we need to wait until we finish closing the genome and the chromosomes are fully assembled?

Thanks in advance for your time and help, Christopher Pauli

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/iobio/vcf.iobio.io/issues/62#issuecomment-456239541, or mute the thread https://github.com/notifications/unsubscribe-auth/AAf7Z-Jyiq0k_uaguREteEhTr2H7iFrkks5vFmgygaJpZM4aIXUk .

C-Pauli commented 5 years ago

Hi Thanks for getting back to me!

Here's a link to a drive folder with a reference genome in the least number of complete contigs, with a vcf file and index using that genome as a reference. https://drive.google.com/file/d/1f8zB80ljmdDVqO4JI_6k9g5wj9KW6fC_/view?usp=sharing

There also are chromosomal level scaffolds now, but they have not been fully completed and thus have gaps which I'm unsure how that'd affect vcf.iobio. Here is a link to the chromosomal level assemblies https://www.ncbi.nlm.nih.gov/assembly/GCA_000230575.4 if that would help integrating this genome into your platform!

I'm excited to see if this works and thank you again for trying.

Best wishes, Christopher Pauli

AlistairNWard commented 5 years ago

Thanks Christopher,

We are looking at making that reference available within vcf.iobio and also at perhaps limiting the number of reference sequences shown in the wheel to try and not overwhelm the app. This will take a bit of effort and updating of our databases and such, so it will take us a little bit of time. We'll let you know if it was successful as soon as possible!

Alistair Ward, PhD Co-founder | President | COO Frameshift Genomics Inc.

Director, Research and Science Eccles Institute of Human Genetics University of Utah School of Medicine

On Mon, Feb 4, 2019 at 1:03 PM Christopher Pauli notifications@github.com wrote:

Hi Thanks for getting back to me!

Here's a link to a drive folder with a reference genome in the least number of complete contigs, with a vcf file and index using that genome as a reference. https://drive.google.com/file/d/1f8zB80ljmdDVqO4JI_6k9g5wj9KW6fC_/view?usp=sharing

There also are chromosomal level scaffolds now, but they have not been fully completed and thus have gaps which I'm unsure how that'd affect vcf.iobio. Here is a link to the chromosomal level assemblies https://www.ncbi.nlm.nih.gov/assembly/GCA_000230575.4 if that would help integrating this genome into your platform!

I'm excited to see if this works and thank you again for trying.

Best wishes, Christopher Pauli

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/iobio/vcf.iobio.io/issues/62#issuecomment-460349344, or mute the thread https://github.com/notifications/unsubscribe-auth/AAf7Z2UeoWpV3La8j4Pawl8Hth2OqaeDks5vKHX-gaJpZM4aIXUk .

tonydisera commented 5 years ago

Hi Christoper -

Thanks for your patience. Al and I are looking at the vcf file that you provided. We noticed that the reference names are different than those in the fasta file.

Screen Shot 2019-03-12 at 2 32 09 PM

When Al looked at the fasta file, he found these reference names:

000000F|arrow 32623077 000001F|arrow 24762745 000002F|arrow 24267864 000003F|arrow 292355 000004F|arrow 22512959 000005F|arrow 20753567 000006F|arrow 20410247 000007F|arrow 465927 000008F|arrow 16150351

Should we use the reference names found in the vcf contig header?

Best regards, Tony

tonydisera commented 5 years ago

Hi Christopher -

I'm just following up. I don't have your email, so I will try posting here.

I wanted to confirm the reference names for the Cannabis genome build. Then we will make the necessary changes to support this build in vcf.iobio.

Thanks, Tony

C-Pauli commented 5 years ago

Hi Tony and Alistair,

Thanks for reaching out, it's been a little hectic recently with traveling for work!

So since we last spoke, there have been some developments and there is a better Cannabis Reference Genome now, and I believe it was my mistake with the last vcf being mapped to a different reference.

I attached a vcf and genome file of a complete primary assembly, which contains the 10 chromosomes and about 210 other unplaced contigs. Do you think it would be better to limit the website just to the regions on the 10 chromosome scaffolds and exclude the unplaced scaffolds to make the website more functional and useful?

If so, I can map the reads and generate a vcf of just the 10 chromosomes if that would be helpful!

Additionally, there are also a chloroplast and mitochondrial genome in Cannabis, that I'm unsure how they would be added to the site.

My personal email is Cannabis.Christopher@gmail.com, and I'm sorry again for how long it has taken me to get back to you. CBDRx_Harvard__220_Contig.fna https://drive.google.com/a/colorado.edu/file/d/1RXnvfIzQ5u196a1gp55hhdO-CWnWvUgA/view?usp=drive_web SRR8346822_to_Harvard_Assembly.vcf https://drive.google.com/a/colorado.edu/file/d/16QY0ZnhScx59gKa4ENns3kpJEgbnHK2e/view?usp=drive_web

Best, Christopher Pauli

On Fri, Apr 5, 2019 at 1:02 PM Tony Di Sera notifications@github.com wrote:

Hi Christopher -

I'm just following up. I don't have your email, so I will try posting here.

I wanted to confirm the reference names for the Cannabis genome build. Then we will make the necessary changes to support this build in vcf.iobio.

Thanks, Tony

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/iobio/vcf.iobio.io/issues/62#issuecomment-480388092, or mute the thread https://github.com/notifications/unsubscribe-auth/AWUwtY5Up8OF2xPeZNIzCl8DSIL4HN3Kks5vd53YgaJpZM4aIXUk .

yannickwurm commented 5 years ago

Hi,
I had the same question (working with fire ant genome assembly) - even if suboptimal choice of contig.

On your website the loading seems to stall for my vcf - I guess because its not human!

Is there a way for me to run a vcf.iobio instance locally equivalent to what you're showing a screenshot of? Thanks! Yannick

tonydisera commented 5 years ago

Hi Yannick -

Does your vcf for fire ant have the contig lengths in the VCF header records? If so, I am planning on making a modification to vcf.iobio.io to look there if a genome build is not specified.

If there is an assembly, I'm happy to load in the references for our IOBIO services.

Either way, would you be able to send me a test vcf file?

Best regards, Tony Di Sera

tonydisera commented 5 years ago

Hi Christopher -

You should be able to load your VCF files for Cannabis now. You can just ignore the 'Species' selection and click the load button. If it doesn't work, try deleting your browser's local cache. Please let me know if you run into any problems.

Best regards, Tony

tonydisera commented 5 years ago

Hi Christopher -

We have added a 'Not specified' to the Species dropdown. So you can select this now instead of 'Human' and it will use the VCF header to determine the contigs to display.

Screen Shot 2019-05-30 at 9 55 32 AM

Thanks for your patience! Tony

tonydisera commented 5 years ago

Hi, I had the same question (working with fire ant genome assembly) - even if suboptimal choice of contig.

On your website the loading seems to stall for my vcf - I guess because its not human!

Is there a way for me to run a vcf.iobio instance locally equivalent to what you're showing a screenshot of? Thanks! Yannick

Hi @yannickwurm -

I just wanted to let you know that there is an option 'Not specified' in the Species dropdown. If your vcf has contig headers with the lengths, you should be able to pick 'Not specified' and load the vcf in vcf.iobio.

Best regards, Tony Di Sera

C-Pauli commented 5 years ago

Hi @tonydisera,

Thank you so much for helping me with that and it looks fantastic!

You're the best!

Best wishes, Chris Pauli

yannickwurm commented 5 years ago

Hi @tonydisera - I tried uploading but didn't work (stayed stuck on "accessing file headers" - is this for the website https://vcf.iobio.io or for local install? I'm happy to share a vcf by email if helpful

thanks v much

tonydisera commented 5 years ago

Hi @yannickwurm,

Sorry to hear the your vcf won’t load. Yes, if you could send me the Vcf, that would be great.

Tony tonyads@genetics.utah.edu

tonydisera commented 5 years ago

Hi @yannickwurm ,

Thanks for sending the vcf file. The file isn't loading because of the size of the header file. It looks like there are over 67,000 contig header records in the file. I edited the vcf file, pruning the number of contig header records down to around 1000 and it will load, but only if it is served from a URL (like an Amazon S3 bucket). Our team is working on a bug fix for local files so that the load button is enabled after files are selected.

I don't have an quick fix for the large header file. I think our code that reads the local binary file is not able to handle this size, but it will take some research. In the meantime, can you prune the number of contig records in the header? Meanwhile, @adityaekawade is going to fix the problem where the 'Load' button is disabled.

Best regards, Tony

tonydisera commented 5 years ago

@adityaekawade , here is a screen capture showing the problem with the disabled load button for local vcf files.

Screen Shot 2019-06-26 at 5 36 29 PM
tonydisera commented 5 years ago

Hi @yannickwurm -

@adityaekawade has fixed the problem where the load button was disabled when 'Not specified' was selected in the Species dropdown. Would it be possible for you to prune the header records in your VCF file?

Thanks, Tony