Closed SergeyBaikal closed 2 years ago
Hi Sergey,
To get the files for VContact2
from a single Cenote-Taker 2
run, go to the base directory of your run, and type these commands. It should be possible to combine these files from several runs, but just make sure there is only 1 header line in the genes-to-genomes file.
# specify summary file:
SUMMARY="test_ssd0_4ct_CONTIG_SUMMARY.tsv"
# make files for VContact2
echo "protein_id,contig_id,keywords" > vcontact2_gene_to_genome1.csv ; tail -n+2 $SUMMARY | cut -f2,4 | while read VIRUS END ;do if [[ "$END" == "DTR" ]] ; then AA=$( find . -type f -name "${VIRUS}.rotate.AA.sorted.fasta" ) ; else AA=$( find . -type f -name "${VIRUS}.AA.sorted.fasta" ) ; fi ; grep -F ">" $AA | cut -d " " -f1 | sed 's/>//g' | while read LINE ; do echo "${LINE},${VIRUS}" ; done >> vcontact2_gene_to_genome1.csv ; cat $AA >> vcontact2_all_proteins.faa ; done
This will make files: vcontact2_all_proteins.faa
and vcontact2_gene_to_genome1.csv
I hope this helps!
Thank you very much for your prompt response Mike! Yes, it helped me.
Good afternoon! Tell me please. Which of these files is common to all annotated proteins? Which can be taken for example for vConTACT analysis?
all_LIN_HMM2_proteins.AA.fasta all_LIN_rps_proteins.AA.fasta all_LIN_sort_genome_proteins.AA.fasta all_prunable_rps_proteins.AA.fasta all_prunable_seq_proteins.AA.fasta