Closed ecbaker7-tamu closed 5 months ago
Hi, @MichaelHiller and @kirilenkobm. I just wanted to add details to this request. We understand this is probably an issue because we use the chromosome with "NCXXX.Y," specified in issue #8 #3 and added in the README. Still, we would appreciate knowing what is the best method to rename our input files and being able to use your custom scripts to rename afterward standalone_scripts/rename_chromosomes_back.py.
We want to use make_lastz_chain
and TOGA
to check whether the ortholog detection using RefSeq on some of our genomes could be improved compared to OrthoFinder, so we want to keep the GCF accession. I am sure the solution is very straightforward, and I am sorry for the silly question. Bioinformatics is hard, even after many years!! o(>.<)o
Thanks for your time!
I believe I've corrected it myself, the code I used is below. and was saved into a .sh file
input_file="input-name.fna" output_file="input-name_only_contig_name.fna"
sed -e 's/^(>\S)./\1/' "$input_file" > "$output_file"
echo "Processing complete. Output written to $output_file."
Thx for sharing this. This is the easiest solution.
Hi, I am trying to run this pipeline through Anaconda and I am getting the following error:
_Error! File: data/Nitens-and-Greg/GCF_021461395.2_iqSchAmer2.1_genomic.fna - detected space-or-tab-containing sequence: NC060119.1 Schistocerca americana isolate TAMUIC-IGC-003095 chromosome 1, iqSchAmer2.1, whole genome shotgun sequence Please exclude or fix sequences with spaces and tabs.
From what I can tell, it is a problem with the formatting of the header in the fasta file as suggested in the GitHub. I have seen the script to rename after running but there is no script to rename before. do you have a script I can use or a suggestion on how to fix this error? I have included a screenshot of the full run below.
I am trying to run this on very large RefSeq genomes (~8Gb) and I would appreciate any help you can offer!