if head -1 reference.fasta | grep -e 'dDocent' reference.fasta 1>/dev/null; then
The logic is to take the first line of the reference fasta (head command) containing the name of the first chromosome/scaffold. The string is then passed to grep to check if it's a dDocent-created file. However the grep command specifies again the reference file, thus forcing the scan of the entire uncompressed reference, which can easily be in the GB of data. The correct command should be
if head -1 reference.fasta | grep -e 'dDocent' 1>/dev/null; then
without the file name. In this way useless computation is avoided.
Four times in the code there's a line like
if head -1 reference.fasta | grep -e 'dDocent' reference.fasta 1>/dev/null; then
The logic is to take the first line of the reference fasta (head command) containing the name of the first chromosome/scaffold. The string is then passed to grep to check if it's a dDocent-created file. However the grep command specifies again the reference file, thus forcing the scan of the entire uncompressed reference, which can easily be in the GB of data. The correct command should be
if head -1 reference.fasta | grep -e 'dDocent' 1>/dev/null; then
without the file name. In this way useless computation is avoided.
This issue is present four times in the code:
FIRST INSTANCE https://github.com/jpuritz/dDocent/blob/9718247b7f533a71057787d77c5232b6b97065c5/dDocent#L341
SECOND INSTANCE https://github.com/jpuritz/dDocent/blob/9718247b7f533a71057787d77c5232b6b97065c5/dDocent#L424
THIRD INSTANCE, slightly different in the searched string. The issue remains. https://github.com/jpuritz/dDocent/blob/9718247b7f533a71057787d77c5232b6b97065c5/dDocent#L341
FOURTH INSTANCE https://github.com/jpuritz/dDocent/blob/9718247b7f533a71057787d77c5232b6b97065c5/dDocent#L424