Closed johnbradley closed 3 years ago
@wodanaz Should consensus_sequences.fasta
be made up just the *.cleaned.fasta
files instead of all *.fasta
files?
The cat *.fasta
will include some *.masked.fasta files.
Code that creates the .cleaned.fasta and .masked.fasta files: https://github.com/wodanaz/Assembling_viruses/blob/7e7dc31dce71194ce8055e0752812fdf9b0150a1/scripts/run-bcftools-consensus.sh#L26-L27
Correct, we should use *cleaned.fasta
@wodanaz The grep command is searching two files: lineage_report.csv
and sars-cov2-example.csv
.
Should we just be searching just sars-cov2-example.csv
($Project_name.csv) ?
The pangolin comand creates a single csv file with a default name of lineage_report.csv, but since we are specifying --outfile
it would create something like sars-cov2-example.csv
.
grep -E 'B.1.351|B.1.1.7|P.1|P.2|B.1.427|B.1.429|B.1.526' lineage_report.csv sars-cov2-example.csv > sars-cov2-example_lineages_of_concern.csv
Sorry, it should have been a single file, the output from pangolin.
I added sars-cov2-example.csv trying to represent the name variable given in the flag -i at the beginning of the pipeline.
that means, the pangolin run should have the variable of -i as the output name.
Add a step to the end of the pipeline that runs the pangolin software on consensus sequences.
Notes from @wodanaz
pangolin consensus_sequences.fasta --outfile $Project_name ( same as "-i sars-cov2-example" from the dds project)
grep -E 'B.1.351|B.1.1.7|P.1|P.2|B.1.427|B.1.429|B.1.526' lineage_report.csv sars-cov2-example.csv > sars-cov2-example_lineages_of_concern.csv