pimbongaerts / radseq

Collection of Python scripts for parsing/analyses of RAD-seq data
21 stars 12 forks source link

how to write pyrad2fasta output to fasta file #2

Open advaudo opened 6 years ago

advaudo commented 6 years ago

Hi, I'm trying to map my consensus sequences from ipyrad to a reference genome, but there are no actual fasta files of these sequences. pyrad2fasta seems like a perfect option, but it only prints out the list of sequences on screen. Is it possible to actually create a fasta file with this function? Thanks so much for your help, Anthony

pimbongaerts commented 6 years ago

Hi Anthony. All you need to do is redirect that output to a file, using >. For example: python3 pyrad2fasta.py ipyrad_output.loci > ipyrad_output.fasta (or see example here). Hope that helps!

advaudo commented 6 years ago

That's great thanks, I did manage to figure out that part the other day. I do have one slightly more complicated question if you know. The .loci file has dashes "--" in some of the reads, especially at the end of some reads, I suppose these dashes come from where sequences aren't aligned perfectly at the end with missing bases. These dashes are transferred .fasta file as well and seem to create some issues with mapping and blasting sequences, do you know a way these can be fixed or removed? Sorry if this is out of topic, just new to all this. Thanks, Anthony

pimbongaerts commented 6 years ago

Yes - the gaps are purposely transferred to make sure it still corresponds to the positions as referred to in the VCF. However, for mapping etc. they can be easily removed with e.g.: $ sed 's/-//g' ipyrad_output.fasta > ipyrad_output_no_gaps.fasta. Hope that helps?