hillerlab / make_lastz_chains

Portable solution to generate genome alignment chains using lastz
MIT License
44 stars 8 forks source link

Fragmented genomes #42

Open giovannaVeiga opened 10 months ago

giovannaVeiga commented 10 months ago

Hi everyone,

Do you think I can run make lastz chains in fragmented genomes? One has 757 scaffolds and the other has 60,750. I already used RAGTAG to improve the assembly and they are already masked. I am using a 32 CPU server to run the analysis and I am planning to run TOGA too.

Thanks in advance

MichaelHiller commented 10 months ago

Yes, TOGA has the ability to join orthologous gene fragments that are split across scaffolds. The flag to enable this is now on by default.

It is always a bit hard to predict how well this works, but we showed in Fig 4 that this can be quite effective. image

giovannaVeiga commented 9 months ago

Hi,

Thank you for the response! I read the article again and I think it will work on my data. My only issue was that it was taking too long to align the fragmented genomes and I was thinking whether it was possible to do this alignment in a timely manner.

If possible, I would like to ask another question. I am using not model species as reference genomes, so the archives of isoforms and U12 usually are not available. Do you think this will interfere with the annotation? I do have some species with data from Ensembl annotation that I would use as input reference for TOGA, but all my other data are RefSeq annotation, do you think it will affect downstream analysis, such as OrthoFinder?

Again, thank your time and help

Best regards,

Giovanna

Em qua., 8 de nov. de 2023 às 10:51, Michael Hiller < @.***> escreveu:

Yes, TOGA has the ability to join orthologous gene fragments that are split across scaffolds. The flag to enable this is now on by default.

It is always a bit hard to predict how well this works, but we showed in Fig 4 that this can be quite effective. [image: image] https://user-images.githubusercontent.com/8644098/281424028-b989a7a3-5c64-49a3-9222-840bc118911b.png

— Reply to this email directly, view it on GitHub https://github.com/hillerlab/make_lastz_chains/issues/42#issuecomment-1801931699, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATVSHFECIPPF6XG3EIISOA3YDOE7HAVCNFSM6AAAAAA7CYLQNGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBRHEZTCNRZHE . You are receiving this because you authored the thread.Message ID: @.***>

-- Giovanna Selleghin Veiga PhD at Laboratory of Evolutionary Genomics Department of Genetics and Molecular Biology University of Campinas (UNICAMP) - Brazil

MichaelHiller commented 9 months ago

Right, information on U12 and comprehensive isoform knowledge is more difficult to get for other species. Of course, having them, would help. I wouldn't worry too much about U12 introns, as there are only 500-700 U12 introns in total and some are also GT .. AG.

But if you include several isoforms per gene, then list which transcripts belong to which gene in the isoform file. Otherwise, TOGA will treat each isoform (transcript) as a gene and the orthology types would be wrong.