Open Robvh-git opened 4 years ago
Hi Robbert, thanks again for trying the pipeline and finding the mistakes in the scripts. I went through the scripts you mentioned and they are indeed outdated. Now I have have a script that runs the pipeline starting from step 4 until the end, which will overcome the formatting issues. Also, for this version of the script, it will not be necessary to edit the .csv matrix. I will upload it and update the readme information still this week. I am just doing some final tests. Best regards, Manuel
Hi manual,
again, thank you for the quick reply. I am looking forward to the new script and the updated readme.
Thanks!
Robbert
Hi Robber, I updated the pipeline. Please check it out and let me know if you have any questions or if you find any problem. Best, Manuel
Hi Manuel,
thank you for updating the pipeline, the description looks much clearer now. Very nice that there is 1 script ( Sequence_Allele_Call.py
) that replaces 4 others.
Best, Robbert
Hello Manuel,
I'm further experimenting with your scripts, but I receive an error which I cannot fix when running script 6:
python correct_allele_sequence.py ./script5_output/Ti1_TG.fasta ./script4_output_test/ 1 ./script6_output/output_file.fasta
I receive following error:
parameter 1
./script5_output/Ti1_TG.fasta
:Ti1_TG.fasta
is a fasta file containing the 2 consensus sequences from this marker (i.e. a heterozygotous marker). I obtained this by merging the 2script 5
output files of this marker by executingcat Ti1_TG_Sample1_Al_399_C_25_70.fasta Ti1_TG_Sample1_Al_401_C_77_70.fasta > Ti1_TG.fasta
. Is this the correct way of obtaining this file? SoTi1_TG.fasta
looks like:This is just from 1 marker as a test, but I have these files for all markers from 1 sample you used in the Nile Tilapia research. In the file from this marker, no
Ns
are present. In the description you state for the first parameter ofscript 6
:fasta file containing all consensus sequences from one marker.
, but could you clarify this? So per sample and marker, there can be 1 (if homozygote) or 2 (if heterozygote) consensus sequences in this file? Or do I misinterpret this?Parameter 2
./script4_output_test/
: this is the script 4 output of markerTi1_TG
. This folder contains 2 filesTi1_TG_Sample1_Al_399.fasta
andTi1_TG_Sample1_Al_401.fasta
, so the output from script 4 of this marker and sample.parameter 3:
1
: I use this number just as testparameter 4:
./script6_output/output_file.fasta
: the directory and file name where the output of this script should be saved.Is this the correct input? I'm wondering if the reason for the error happend 'upstream' in
scripts 3, 4 or 5
as I noticed some errors there.In script 4, the description of the script seems outdated. Upon inspecting the script, I noticed that there are actually 4 arguments required instead of 3 as described. Also, I had to manually edit the
.csv
output file fromscript 3
before being able to use it inscript 4
: I had to perform following adjustments: 1)changetoo little reads
to a0
. 2) delete_mancheck
if present 3) in the first row (i.e. the header) I had to delete_1
and_2_
from the marker information. If I did not delete this, the script will search for.fastq
files with_1
and_2
in it, where are not present in the output from script 1.I had to manually edit
script 5
as the output names where not correct. I did 3 adjustments which are marked in the attached file with comments#ROB EDIT
: get_consensus_and_freq_edit.txtI hope I described the problem and the origin of the files clear enough. Could you please advice me on how to solve this error?
Kind regards, Robbert