mquinodo / AutoMap

Tool to find regions of homozygosity (ROHs) from sequencing data.
24 stars 9 forks source link

can it be used on non-human data? #11

Closed FatihSarigol closed 2 years ago

FatihSarigol commented 2 years ago

I suppose if I change the repeats file that it is using with a repeats file for my species it should, but would that be wrong to do it? If it is really not possible, please state this clearly on your program description page that it only works for human data. I did notice that sentence in the paper but couldn't guess it really doesn't work with any other species. Online version makes sense because you can't have repeat datasets for all species but unix version?

Another question, what if I don't incorporate repeat coordinates at all?

Thanks.

mquinodo commented 2 years ago

Dear Fatih,

I created a new repository with AutoMap for non-human data where you do not need repeats file. https://github.com/mquinodo/AutoMap-nonhuman

I hope that you will be able to use it.

Best, Mathieu

FatihSarigol commented 2 years ago

Thanks Mathieu for the quick build! I tried it out today in various ways.

From 2571793 variants after filtering, on step 3 it brought down 95630 regions to only 22 in step 3:

I ran the default settings which I can of course change but does it seem realistic that filtering steps took away that many regions? Or can I assume that something unexpected happened due to being nonhuman data that we overlooked something else maybe?

I am thinking about editing the plotting scripts for the chromosome names of my species, but just a side question here, how does the "Common ROHs to multiple individuals" plot work, does it plot each sample separately or together also by highlighting the shared ROHs between samples? That could be a nice addition to your program I think. (And currently the non-human version also looks for make_graph_common.R file which I believe you removed because of chromosome names but a reminder to remove it from the bash script too).

The test vcf file contains only SNPs so I guess the answer from that, but does AutoMap need the non-SNP sites to be in the VCF file? That way from complete VCF it is taking considerably longer time due to file size I suppose (and not because it is doing the analysis any differently), so if that doesn't matter, you can add that suggestion to filter for only SNP sites in a vcf before running it.

Here are a couple of side suggestions: Add a link to your publication to github description page. AutoMap also works with perl version v5.16.3 too (after editing the noperl rule inside the script) at least I can confirm that for the test data I get exactly the same result.

Thanks

mquinodo commented 2 years ago

Thank you for your answer.

The numbers are ok if the sample is not inbred. The graph doing the common regions is an additional graph with only common regions. Yes, I removed the script due to chromosome names and sizes and I will also remove it in the main script. AutoMap only requires and VCF file and not a gVCF. As the goal of the initial tool is for human only, I think a link from the github page to the non-human version is sufficent.

Best, Mathieu