Closed kfuku52 closed 2 years ago
Yeah, this functionality needs some love, I agree. The way it works is it looks for an Index
folder (has to be capitlized) in the working directory and tries to find any file that starts with the Genus_species
prefix. But that's where it stops.
The assumption is that there should only be index files in the Index directory, so they should be already run through kallisto index
or something comparable.
At some point we obsoleted an index builder through amalgkit, so that's something the user has to do on their own before running quant
.
I'll make the messages clearer.
I wonder if there is a way to verify if something is an actual index file and not a random text or fasta file that found its way into the Index folder.
index builder
I missed that functionality, and probably we should keep it rather than obsoleting it. Could you add a wiki page for the usage? I'll give it a try and send you my feedback.
index builder
I missed that functionality, and probably we should keep it rather than obsoleting it. Could you add a wiki page for the usage? I'll give it a try and send you my feedback.
we obsoleted it, because all it did was calling kallisto index
. There was no automatic index building in place at the time and there was no functional difference between running kallisto index
and amalgkit quant --index_build
.
But that was long before using metadata for everything amalgkit
. But now an automatic index builder is possible, since metadata is a required input anyways.
I'll look into this!
OK, please fix/update amalgkit
for this issue if needed and write an instruction for indexing.
Bumb for myself!
I have reintroduced index building to quant
.
The way this works is the following:
--fasta_dir PATH
indicates folder containing fasta files with species names, required for building index
--build_index yes|no
enables functionality
goes through quant
as normal. If --build_index yes
, quant
will look for the currently processed species according to the metadata file in the index directory first. If there is no available index for that species,quant
will try to find a matching fasta file in the --fasta_dir
and start building the index. quant
will then immediately go on quantifying samples as normal.
https://github.com/kfuku52/amalgkit/commit/6f7971b83bb6be07b2c7de9922fcf521c13e94ca and https://github.com/kfuku52/amalgkit/commit/23c824c750dd4bcc540fc2a68144bf7faa62449d (safer fasta file detection)
I'm not sure what
sanity
does here. It says that the index is found and shows the PATH to reference fasta I put in amalgkit_out/Index. Should fasta be in another directory?Also, help messages are not user-friendly. In many cases, "Index" does not make sense if you don't mention the "reference fasta".
Minor point: Does the default "Index" dir has to be the title case even though all other subdirs do not have a capitalized first letter?