khyox / recentrifuge

Recentrifuge: robust comparative analysis and contamination removal for metagenomics
http://www.recentrifuge.org
Other
86 stars 7 forks source link

-k for multiple directories of .krk files? #37

Closed rjsorr closed 2 years ago

rjsorr commented 2 years ago

Hi, I have >500 samples to compare and >20 controls. As such I want to keep the script simple and I'm therefore wondering if I can add all my .krk files to one directory and all my .krk sample files to another and then simply point to the two directories with the -k option, with -c giving the file number in the first -k directory? from the main page I see this can be done with real samples pointing to a single directory, but can this be done with multiple directories and in combination with -c? I ask before I test as I don't want to waste time moving all files etc. if it is not possible?

original script (from main page) rcf -n /my/tax/dir -k CTRL1.krk -k CTRL2.krk -k X1.krk -k X2.krk -k X3.krk -c 2 -o Xsamples.rcf.html -s KRAKEN -y 25 -x 9606

modified script (from main page) rcf -n /my/tax/dir -k ./controls -k ./real_samples -c 2 -o Xsamples.rcf.html -s KRAKEN -y 25 -x 9606

cheers

khyox commented 2 years ago

Hi @rjsorr,

The possibility of two different directories for regular samples and control samples would be something to consider as a new/future feature. I've dealt with thousands of regular samples, but never with a large number of controls. I agree that, when you have many controls, maybe it would be a good idea to have them organized in different directories.

In you case, as a quick way to proceed now, I would suggest to keep all the files in the same directory (so that you can use the -k dir option) but rename the controls (e.g., using Linux rename command to change all of them in one line) so that they are alphabetically ordered before the regular controls —for example, adding a prefix such as 0_ to the filename of those control samples. With that, you can use -c numctrls with the number of such controls, and you will have a quick way to proceed. This is not so convenient as the two directories that you propose, but it's something that you can use now and a simple/easy solution —it's what I usually do.

Cheers.

rjsorr commented 2 years ago

cheers @khyox sounds like a good solution. I'll give it a try

khyox commented 2 years ago

Closing after opening #42 for the improvement suggestion.