malonge / RagTag

Tools for fast and flexible genome assembly scaffolding and improvement
MIT License
461 stars 47 forks source link

How to determine the number of ref.fasta. #116

Open abcyulongwang opened 2 years ago

abcyulongwang commented 2 years ago

Hello!

Ragtag is a great software, I am a newbie and I have two small questions.

The first one is, "ref1.fasta" and "ref2.fasta" in ragtag.py scaffold -o out_1 ref1.fasta query.fasta ragtag.py scaffold -o out_2 ref2.fasta query.fasta represent different genomes of the species, if the species has many genomes, how many ref.fasta are used Better?

The second is that I used the "ragtag.py patch ./final_result/ragtag.scaffold.fasta xxx.contigs.fasta -t 10" command to get a lot of output files, which made it difficult to figure out which file to use next. Next, should I continue running "ragtag merge" with "ragtag.patch.fasta" as input file? And is the "ragtag patch" step necessary because it runs a long time.

Best,yulong

VictorCalderon commented 1 year ago

Hello, I will try to answer your questions one by one:

If the species has many genomes, how many ref.fasta should you use?

I don't entirely understand what you mean by species having multiple genomes. Perhaps you are referring to chromosomes? If you're not referring to chromosomes, you might have encountered multiple assemblies for your target species' genome; if this is the case, I suggest you find the best assembly and use it as a reference. Also, you could check if NCBI has an assembly for this particular species you're studying.

RagTag patch produces many files, which one should I use?

As mentioned in the documentation, RagTag would rather create new files than modifying old expensive ones, these intermediary files shouldn't worry you too much. Simply use ragtag.patch.fasta as suggested in here.

Should I be using RagTag merge with ragtag.patch.fasta as input?

From the documentation:

RagTag merge is a tool to merge and reconcile different scaffoldings of the same assembly. So if you're using various methods to obtain a better draft, it is an option that RagTag brings you. If this is not the case, and you are just scaffolding one genome with one approach, you should consider skipping using this tool, as it may not help you substantially.

Should I use RagTag patch for my assembly?

If you want to fill your gaps with a previously sequenced genome, or a previous scaffold that had better representations of some regions of interest and it won't obstruct downstream analysis, definitely. It is very important to state this in your methodology, as you utilized known data for this new-ish assembly.

Please feel free to ask anything regarding my answers.