maracashay / DAWG-Helpline

Need help deciding what step to do or what specific commands to pass when analyzing your amplicon data? Ask AWAY! DAWG is here to help :)
0 stars 1 forks source link

Renaming sequences in a multiple sequence alignment based on a metadata file #4

Open samueljmt opened 4 years ago

samueljmt commented 4 years ago

I have a multiple sequence alignment with gene IDs as the names of the sequences (.fa file). I also have a metadata excel table with two columns (gene IDs and taxonomic names). I want to rename (replace) the gene ID names in the sequences, so they match the taxonomic names. metadata_gapA.xlsx

maracashay commented 4 years ago

Hello there,

I think the easiest way to do this would be when you are in R. I had to rename my sequences in the .fa file and it was fairly simple in R using a column in a metadata file.

I did something like this: uniquesToFasta(OTUs, fout= 'rep-seq-fulls.fa', ids = paste0(colnames(ps@otu_table))) uniquesToFasta is part of the dada2 package in R

So your code would be something like: uniquesToFasta(Sequences, fout = 'rep-seq-full.fa', ids = paste0(metadata_gapA$Taxonomy)

If you prefer to try to do things in the command line, there are several options here you can try.

Good luck!