Open juansebe1 opened 3 weeks ago
The gene/protein identifiers ("lcl|NZ_CP018093.1_prot_FSC454_RS07665_1515") are unique to me - which database or program are these from? I've written a bash script you can copy and paste into a text file in the same directory you are attempting to run OrthoRefine from. Running it will generate a file called "OrthoRefine_debug.log.txt", which you can copy or post back. As a privacy warning, it will print all the file names in the current directory - which you may want to review before posting them.
#!/usr/bin/env bash
# print current files names only in dir
ls > OrthoRefine_debug.log.txt
echo >> OrthoRefine_debug.log.txt
#print contents of "input.txt" to OrthoRefine_debug.log.txt
cat input.txt >> OrthoRefine_debug.log.txt
echo >> OrthoRefine_debug.log.txt
# check if file "No.tsv" exist in current directory
if [ -f "N0.tsv" ]; then
echo "N0.tsv exists" >> OrthoRefine_debug.log.txt
# check if file "N0.tsv" is in dos or unix format
if file "N0.tsv" | grep -q "CRLF"; then
echo "N0.tsv is in dos format" >> OrthoRefine_debug.log.txt
else
echo "N0.tsv is in unix format" >> OrthoRefine_debug.log.txt
fi
else
echo "N0.tsv does not exist" >> OrthoRefine_debug.log.txt
fi
echo >> OrthoRefine_debug.log.txt
# read the first column of each line from "input.txt" and write to array
IFS=$'\n' read -d '' -r -a lines < input.txt
# verify that each element of array exists as a feature table file. E.g. "GCF_000005845.2" as "GCF_000005845.2_ASM584v2_feature_table.txt"
for i in "${lines[@]}" ; do
if [ -f "$i"*_feature_table.txt ]; then
echo "$i"*_feature_table.txt exists >> OrthoRefine_debug.log.txt
else
echo "$i"_feature_table.txt does not exist >> OrthoRefine_debug.log.txt
fi
done
echo >> OrthoRefine_debug.log.txt
# verify that each element of array exists as a fasta file. E.g. "GCF_000005845.2" as "GCF_000005845.2_ASM584v2_protein.faa"
for i in "${lines[@]}" ; do
if [ -f "$i"*_protein.faa ]; then
echo "$i"*_protein.faa exists >> OrthoRefine_debug.log.txt
else
echo "$i"_protein.faa does not exist >> OrthoRefine_debug.log.txt
fi
done
echo >> OrthoRefine_debug.log.txt
# Check that both the feature table and fasta file exist for the same element of array lines
for i in "${!lines[@]}" ; do
if [ -f "${lines[$i]}"*_feature_table.txt ] && [ -f "${lines[$i]}"*_protein.faa ]; then
echo "Both feature table and fasta file exist for ${lines[$i]}" >> OrthoRefine_debug.log.txt
# Store the index of the first occurrence
first_occurrence=$i
break
fi
done
# Store the 11th column from the output above in a new array
IFS=$'\n' read -d '' -r -a new_array <<< "$(grep -m 10 "^CDS" "${lines[first_occurrence]}"*_feature_table.txt | awk '{print $11}')"
# Verify each element of new_array can be found in the asscoiated fasta file
for i in "${new_array[@]}" ; do
if grep -q "$i" "${lines[first_occurrence]}"*_protein.faa; then
echo "$i" exists in "${lines[first_occurrence]}"*_protein.faa >> OrthoRefine_debug.log.txt
else
echo "$i" does not exist in "${lines[first_occurrence]}"*_protein.faa >> OrthoRefine_debug.log.txt
fi
done
echo >> OrthoRefine_debug.log.txt
# Verify that each element of new_array can be found in "N0.tsv"
for i in "${new_array[@]}" ; do
if grep -q "$i" N0.tsv; then
echo "$i" exists in N0.tsv >> OrthoRefine_debug.log.txt
# store the line from N0.tsv where the element was found into another array without overwriting the previous element without splitting the line
tmp=$(grep "$i" N0.tsv)
N0_lines+="$tmp\n"
else
echo "$i" does not exist in N0.tsv >> OrthoRefine_debug.log.txt
fi
done
echo >> OrthoRefine_debug.log.txt
# print array N0_lines
for i in "${N0_lines[@]}" ; do
echo -e "$i" >> OrthoRefine_debug.log.txt # -e allows for newline char printing
done
# find the longest line by tabs and comma count in N0_lines
longest_line=$(echo -e "${N0_lines[@]}" | awk -F"\t|," '{print NF}' | sort -n | tail -1)
# for each element of N0_lines, extract the 4th column to end of line and store in another array
IFS=$'\n' read -d '' -r -a new_array2 <<< "$(echo -e "${N0_lines[@]}" | awk -F"\t|," -v e=$longest_line '{ for (i=4;i<e;++i) print $i }' | tr -d ' ')"
# for each element of new_array2, continue to search the feature table files 11th column until the element is found
for i in "${new_array2[@]}" ; do
found_flag=0
for j in "${lines[@]}" ; do
if grep -q "$i" "$j"*_feature_table.txt; then
echo "$i" exists in "$j"*_feature_table.txt >> OrthoRefine_debug.log.txt
found_flag=1
break
fi
done
if [ $found_flag -eq 0 ]; then
echo "$i" does not exist in any feature_table.txt >> OrthoRefine_debug.log.txt
fi
done
Hi! Those are extracted from GenBank and then I runned OrthoFinder to obtain the 'N0.tsv' file.
Here is the result from the bash script : OrthoRefine_debug.log.txt
Are the feature table files (or genome annotation files) located in a different directory? I've attached my debug output from my test E. coli run so you can see what it should kinda look like.
GCF_000005845.2_ASM584v2_feature_table.txt
GCF_000005845.2_ASM584v2_protein.faa
GCF_013892435.1_ASM1389243v1_feature_table.txt
GCF_013892435.1_ASM1389243v1_protein.faa
GCF_016904755.1_ASM1690475v2_feature_table.txt
GCF_016904755.1_ASM1690475v2_protein.faa
GCF_902709585.1_H1-003-0086-C-F.v2_feature_table.txt
GCF_902709585.1_H1-003-0086-C-F.v2_protein.faa
N0.tsv
OrthoFinder
OrthoRefine_debug.log.txt
OrthoRefine_debug.sh
download_ft_fa.txt
input.txt
GCF_000005845.2
GCF_013892435.1
GCF_016904755.1
GCF_902709585.1
N0.tsv exists
N0.tsv is in unix format
GCF_000005845.2_ASM584v2_feature_table.txt exists
GCF_013892435.1_ASM1389243v1_feature_table.txt exists
GCF_016904755.1_ASM1690475v2_feature_table.txt exists
GCF_902709585.1_H1-003-0086-C-F.v2_feature_table.txt exists
GCF_000005845.2_ASM584v2_protein.faa exists
GCF_013892435.1_ASM1389243v1_protein.faa exists
GCF_016904755.1_ASM1690475v2_protein.faa exists
GCF_902709585.1_H1-003-0086-C-F.v2_protein.faa exists
Both feature table and fasta file exist for GCF_000005845.2
NP_414542.1 exists in GCF_000005845.2_ASM584v2_protein.faa
NP_414543.1 exists in GCF_000005845.2_ASM584v2_protein.faa
NP_414544.1 exists in GCF_000005845.2_ASM584v2_protein.faa
NP_414545.1 exists in GCF_000005845.2_ASM584v2_protein.faa
NP_414546.1 exists in GCF_000005845.2_ASM584v2_protein.faa
NP_414547.1 exists in GCF_000005845.2_ASM584v2_protein.faa
NP_414548.1 exists in GCF_000005845.2_ASM584v2_protein.faa
NP_414549.1 exists in GCF_000005845.2_ASM584v2_protein.faa
NP_414550.1 exists in GCF_000005845.2_ASM584v2_protein.faa
NP_414551.1 exists in GCF_000005845.2_ASM584v2_protein.faa
NP_414542.1 exists in N0.tsv
NP_414543.1 exists in N0.tsv
NP_414544.1 exists in N0.tsv
NP_414545.1 exists in N0.tsv
NP_414546.1 exists in N0.tsv
NP_414547.1 exists in N0.tsv
NP_414548.1 exists in N0.tsv
NP_414549.1 exists in N0.tsv
NP_414550.1 exists in N0.tsv
NP_414551.1 exists in N0.tsv
N0.HOG0000583 OG0000421 n0 NP_414542.1 WP_001386572.1 WP_001386572.1 WP_001386572.1
N0.HOG0000584 OG0000422 n0 NP_414543.1 WP_001264663.1 WP_059235060.1 WP_010378218.1
N0.HOG0000585 OG0000423 n0 NP_414544.1 WP_000252740.1 WP_000241676.1 WP_001517712.1
N0.HOG0000586 OG0000424 n0 NP_414545.1 WP_000781090.1 WP_208631050.1 WP_000781035.1
N0.HOG0000587 OG0000425 n0 NP_414546.1 WP_000771325.1 WP_000738743.1 WP_105224911.1
N0.HOG0000588 OG0000426 n0 NP_414547.1 WP_000906158.1 WP_000906164.1 WP_000906159.1
N0.HOG0000589 OG0000427 n0 NP_414548.1 WP_001112548.1 WP_059235058.1 WP_016249591.1
N0.HOG0000064 OG0000022 n4 NP_414549.1 WP_046076191.1, WP_000130195.1 WP_000130184.1 WP_000130186.1
N0.HOG0000590 OG0000428 n0 NP_414550.1 WP_046083027.1 WP_001094685.1 WP_001517716.1
N0.HOG0000591 OG0000429 n0 NP_414551.1 WP_000528529.1 WP_000528545.1 WP_000528538.1
NP_414542.1 exists in GCF_000005845.2_ASM584v2_feature_table.txt
WP_001386572.1 exists in GCF_000005845.2_ASM584v2_feature_table.txt
WP_001386572.1 exists in GCF_000005845.2_ASM584v2_feature_table.txt
WP_001386572.1 exists in GCF_000005845.2_ASM584v2_feature_table.txt
NP_414543.1 exists in GCF_000005845.2_ASM584v2_feature_table.txt
WP_001264663.1 exists in GCF_013892435.1_ASM1389243v1_feature_table.txt
WP_059235060.1 exists in GCF_016904755.1_ASM1690475v2_feature_table.txt
WP_010378218.1 exists in GCF_902709585.1_H1-003-0086-C-F.v2_feature_table.txt
NP_414544.1 exists in GCF_000005845.2_ASM584v2_feature_table.txt
WP_000252740.1 exists in GCF_013892435.1_ASM1389243v1_feature_table.txt
WP_000241676.1 exists in GCF_016904755.1_ASM1690475v2_feature_table.txt
WP_001517712.1 exists in GCF_902709585.1_H1-003-0086-C-F.v2_feature_table.txt
NP_414545.1 exists in GCF_000005845.2_ASM584v2_feature_table.txt
WP_000781090.1 exists in GCF_013892435.1_ASM1389243v1_feature_table.txt
WP_208631050.1 exists in GCF_016904755.1_ASM1690475v2_feature_table.txt
WP_000781035.1 exists in GCF_902709585.1_H1-003-0086-C-F.v2_feature_table.txt
NP_414546.1 exists in GCF_000005845.2_ASM584v2_feature_table.txt
WP_000771325.1 exists in GCF_013892435.1_ASM1389243v1_feature_table.txt
WP_000738743.1 exists in GCF_016904755.1_ASM1690475v2_feature_table.txt
WP_105224911.1 exists in GCF_902709585.1_H1-003-0086-C-F.v2_feature_table.txt
NP_414547.1 exists in GCF_000005845.2_ASM584v2_feature_table.txt
WP_000906158.1 exists in GCF_013892435.1_ASM1389243v1_feature_table.txt
WP_000906164.1 exists in GCF_016904755.1_ASM1690475v2_feature_table.txt
WP_000906159.1 exists in GCF_902709585.1_H1-003-0086-C-F.v2_feature_table.txt
NP_414548.1 exists in GCF_000005845.2_ASM584v2_feature_table.txt
WP_001112548.1 exists in GCF_013892435.1_ASM1389243v1_feature_table.txt
WP_059235058.1 exists in GCF_016904755.1_ASM1690475v2_feature_table.txt
WP_016249591.1 exists in GCF_902709585.1_H1-003-0086-C-F.v2_feature_table.txt
NP_414549.1 exists in GCF_000005845.2_ASM584v2_feature_table.txt
WP_046076191.1 exists in GCF_013892435.1_ASM1389243v1_feature_table.txt
WP_000130195.1 exists in GCF_013892435.1_ASM1389243v1_feature_table.txt
WP_000130184.1 exists in GCF_016904755.1_ASM1690475v2_feature_table.txt
NP_414550.1 exists in GCF_000005845.2_ASM584v2_feature_table.txt
WP_046083027.1 exists in GCF_013892435.1_ASM1389243v1_feature_table.txt
WP_001094685.1 exists in GCF_016904755.1_ASM1690475v2_feature_table.txt
WP_001517716.1 exists in GCF_902709585.1_H1-003-0086-C-F.v2_feature_table.txt
NP_414551.1 exists in GCF_000005845.2_ASM584v2_feature_table.txt
WP_000528529.1 exists in GCF_013892435.1_ASM1389243v1_feature_table.txt
WP_000528545.1 exists in GCF_016904755.1_ASM1690475v2_feature_table.txt
WP_000528538.1 exists in GCF_000005845.2_ASM584v2_feature_table.txt
Yes, I had the feature table and protein.faa in other directory. Now I have these in the same directory but thedebug.log.txt script result says *_feature_table.txt does not exist
The ./orthorefine.exe doesn't run yet. What else is missing or doing wrong?
Thanks
I made a mistake in the debug script as you are using the second and third column of the input file. I also made a change to print the first line of the N0.tsv file for me to see. Can you recopy the script below and run again? If we keep encountering errors, we can drop the second and third column from the input file to see if that is problem.
#!/usr/bin/env bash
# print current files names only in dir
ls > OrthoRefine_debug.log.txt
echo >> OrthoRefine_debug.log.txt
#print contents of "input.txt" to OrthoRefine_debug.log.txt
cat input.txt >> OrthoRefine_debug.log.txt
echo >> OrthoRefine_debug.log.txt
# check if file "No.tsv" exist in current directory
if [ -f "N0.tsv" ]; then
echo "N0.tsv exists" >> OrthoRefine_debug.log.txt
head -1 N0.tsv >> OrthoRefine_debug.log.txt
# check if file "N0.tsv" is in dos or unix format
if file "N0.tsv" | grep -q "CRLF"; then
echo "N0.tsv is in dos format" >> OrthoRefine_debug.log.txt
else
echo "N0.tsv is in unix format" >> OrthoRefine_debug.log.txt
fi
else
echo "N0.tsv does not exist" >> OrthoRefine_debug.log.txt
fi
echo >> OrthoRefine_debug.log.txt
# read the first column of each line from "input.txt" and write to array
IFS=$'\n' read -d '' -r -a lines <<< $(cut -f1 input.txt)
# verify that each element of array exists as a feature table file. E.g. "GCF_000005845.2" as "GCF_000005845.2_ASM584v2_feature_table.txt"
for i in "${lines[@]}" ; do
if [ -f "$i"*_feature_table.txt ]; then
echo "$i"*_feature_table.txt exists >> OrthoRefine_debug.log.txt
else
echo "$i"_feature_table.txt does not exist >> OrthoRefine_debug.log.txt
fi
done
echo >> OrthoRefine_debug.log.txt
# verify that each element of array exists as a fasta file. E.g. "GCF_000005845.2" as "GCF_000005845.2_ASM584v2_protein.faa"
for i in "${lines[@]}" ; do
if [ -f "$i"*_protein.faa ]; then
echo "$i"*_protein.faa exists >> OrthoRefine_debug.log.txt
else
echo "$i"_protein.faa does not exist >> OrthoRefine_debug.log.txt
fi
done
echo >> OrthoRefine_debug.log.txt
# Check that both the feature table and fasta file exist for the same element of array lines
for i in "${!lines[@]}" ; do
if [ -f "${lines[$i]}"*_feature_table.txt ] && [ -f "${lines[$i]}"*_protein.faa ]; then
echo "Both feature table and fasta file exist for ${lines[$i]}" >> OrthoRefine_debug.log.txt
# Store the index of the first occurrence
first_occurrence=$i
break
fi
done
# Store the 11th column from the output above in a new array
IFS=$'\n' read -d '' -r -a new_array <<< "$(grep -m 10 "^CDS" "${lines[first_occurrence]}"*_feature_table.txt | awk '{print $11}')"
# Verify each element of new_array can be found in the asscoiated fasta file
for i in "${new_array[@]}" ; do
if grep -q "$i" "${lines[first_occurrence]}"*_protein.faa; then
echo "$i" exists in "${lines[first_occurrence]}"*_protein.faa >> OrthoRefine_debug.log.txt
else
echo "$i" does not exist in "${lines[first_occurrence]}"*_protein.faa >> OrthoRefine_debug.log.txt
fi
done
echo >> OrthoRefine_debug.log.txt
# Verify that each element of new_array can be found in "N0.tsv"
for i in "${new_array[@]}" ; do
if grep -q "$i" N0.tsv; then
echo "$i" exists in N0.tsv >> OrthoRefine_debug.log.txt
# store the line from N0.tsv where the element was found into another array without overwriting the previous element without splitting the line
tmp=$(grep "$i" N0.tsv)
N0_lines+="$tmp\n"
else
echo "$i" does not exist in N0.tsv >> OrthoRefine_debug.log.txt
fi
done
echo >> OrthoRefine_debug.log.txt
# print array N0_lines
for i in "${N0_lines[@]}" ; do
echo -e "$i" >> OrthoRefine_debug.log.txt # -e allows for newline char printing
done
# find the longest line by tabs and comma count in N0_lines
longest_line=$(echo -e "${N0_lines[@]}" | awk -F"\t|," '{print NF}' | sort -n | tail -1)
# for each element of N0_lines, extract the 4th column to end of line and store in another array
IFS=$'\n' read -d '' -r -a new_array2 <<< "$(echo -e "${N0_lines[@]}" | awk -F"\t|," -v e=$longest_line '{ for (i=4;i<e;++i) print $i }' | tr -d ' ')"
# for each element of new_array2, continue to search the feature table files 11th column until the element is found
for i in "${new_array2[@]}" ; do
found_flag=0
for j in "${lines[@]}" ; do
if grep -q "$i" "$j"*_feature_table.txt; then
echo "$i" exists in "$j"*_feature_table.txt >> OrthoRefine_debug.log.txt
found_flag=1
break
fi
done
if [ $found_flag -eq 0 ]; then
echo "$i" does not exist in any feature_table.txt >> OrthoRefine_debug.log.txt
fi
done
I was checking the feature tables and protein.faa that come in the 'pub_data' folder and they don't look like mine, so I extracted directly from the Genome assembly index, and now they look similar I runned the OrthoFinder again, added the new 'N0.tsv' and tried the ./orthorefine.exe script but the 'Segmetation fault' persists
And, if I dropped the 2nd and 3rd column from 'input.txt' show me a 'Error feature table file missing' message
This is the new debug.log OrthoRefine_debug.log.txt in this file you can see in line 51 and 56, that a pair of protein sequences 'do not exist in N0.tsv'
The proteins missing in N0.tsv isn't causing the crashing issue - it's to warn me that they were present in the feature table but not grouped by OrthoFinder into a HOG so I shouldn't expect to see them.
I downloaded the data files today and was able to run OrthoRefine (using the cpp file from the main and from the GFF branch) with both input files:
GCF_000219045.1
GCF_001885235.1
GCF_003347095.1
Note on this second input file, the columns need to be separated by tabs and not spaces
GCF_000219045.1 c b
GCF_001885235.1 c b
GCF_003347095.1 c b
With the command:
./orthorefine.exe --input input.txt --OF_file N0.tsv --window_size 8 --synteny_ratio 0.5
By genome assembly index, do you mean https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/? If not, OrthoRefine includes a script to download the feature table files and protein fasta from NCBI.
./download_ft_fafiles.sh input.txt
I would recommend checking the input.txt file and the command used to call OrthoRefine. If that doesn't work and you obtained the data from somewhere else besides the ftp website I linked above, make a new directory and use the download_ft_fafiles script to obtain the feature table and fasta files and rerun OrthoFinder with OrthoRefine. If you continue to experience a crash issue, I'll have to write debugger instructions for GDB so I can know which line of code is causing it. I've attached OrthoRefine's output file for these three test data below.
HOG SOG Gene_name GCF_000219045.1_ASM21904v1_feature_table.txt GCF_001885235.1_ASM188523v1_feature_table.txt GCF_003347095.1_ASM334709v1_feature_table.txt
N0.HOG0000000 0.0 glycosyltransferase F7308_RS04340 CGC43_RS01820
N0.HOG0000001 1.0 glycosyltransferase family 2 protein FSC454_RS03920 CGC43_RS03135
N0.HOG0000004 4.0 Bcr/CflA family efflux MFS transporter F7308_RS09935 CGC43_RS08655
N0.HOG0000014 14.0 hypothetical protein F7308_RS10380 FSC454_RS10085
N0.HOG0000014 14.1 hypothetical protein F7308_RS10385 FSC454_RS10090 CGC43_RS09225
N0.HOG0000016 16.0 lipid A export permease/ATP-binding protein MsbA F7308_RS03050 FSC454_RS08395 CGC43_RS07930
N0.HOG0000018 18.0 lysophospholipid acyltransferase family protein F7308_RS09625 FSC454_RS09200 CGC43_RS08785
N0.HOG0000019 19.0 oligopeptide:H+ symporter F7308_RS05250 FSC454_RS03585 CGC43_RS02785
N0.HOG0000019 19.1 oligopeptide:H+ symporter F7308_RS06680 FSC454_RS05345
N0.HOG0000021 21.0 DUF3568 family protein F7308_RS09380 FSC454_RS09105 CGC43_RS08485
N0.HOG0000034 34.0 adenosylmethionine--8-amino-7-oxononanoate transaminase F7308_RS06600 FSC454_RS05265 CGC43_RS04285
N0.HOG0000036 36.0 hypothetical protein FSC454_RS01460 CGC43_RS07380
N0.HOG0000041 41.0 site-specific integrase F7308_RS02895 FSC454_RS07130
N0.HOG0000042 42.0 beta-ketoacyl-ACP synthase II F7308_RS04900 FSC454_RS03230 CGC43_RS02285
N0.HOG0000047 47.0 linear amide C-N hydrolase F7308_RS05335 FSC454_RS03700 CGC43_RS02870
N0.HOG0000047 47.1 linear amide C-N hydrolase F7308_RS07330 FSC454_RS04815
N0.HOG0000052 52.0 pyridoxal phosphate-dependent aminotransferase F7308_RS06130 FSC454_RS04485 CGC43_RS05500
N0.HOG0000052 52.1 pyridoxal phosphate-dependent aminotransferase F7308_RS09335 FSC454_RS09060
N0.HOG0000063 63.0 LysR family transcriptional regulator F7308_RS05745 FSC454_RS04215 CGC43_RS03300
N0.HOG0000064 64.0 DUF3573 domain-containing protein FSC454_RS02090, FSC454_RS02095 CGC43_RS06810
N0.HOG0000065 65.0 IS5 family transposase FSC454_RS09255 CGC43_RS08730
N0.HOG0000067 67.0 glycosyl hydrolase family 18 protein F7308_RS02810 FSC454_RS08580 CGC43_RS08115
N0.HOG0000068 68.0 DUF3568 family protein F7308_RS00145 FSC454_RS00125
N0.HOG0000068 68.1 DUF3568 family protein F7308_RS04665 FSC454_RS07225 CGC43_RS02075
N0.HOG0000070 70.0 phosphatase PAP2 family protein FSC454_RS06215 CGC43_RS03495
N0.HOG0000073 73.0 restriction endonuclease subunit S F7308_RS01485 CGC43_RS07515
N0.HOG0000074 74.0 pilin F7308_RS02110 FSC454_RS01950 CGC43_RS06950
N0.HOG0000075 75.0 LysR substrate-binding domain-containing protein F7308_RS02005 FSC454_RS01835 CGC43_RS07090
N0.HOG0000076 76.0 DNA primase phage associated F7308_RS02915 FSC454_RS09945, FSC454_RS09950
N0.HOG0000076 76.1 toprim domain-containing protein F7308_RS04495 FSC454_RS07110
N0.HOG0000077 77.0 LysR family transcriptional regulator F7308_RS02945 FSC454_RS08495 CGC43_RS08030
N0.HOG0000078 78.0 efflux RND transporter periplasmic adaptor subunit F7308_RS03035 FSC454_RS08410 CGC43_RS07945
N0.HOG0000082 82.0 SDR family oxidoreductase F7308_RS05750 FSC454_RS04220 CGC43_RS03305
N0.HOG0000083 83.0 3-oxoacyl-ACP reductase FabG F7308_RS04910 FSC454_RS03240 CGC43_RS02295
N0.HOG0000084 84.0 diaminopimelate decarboxylase F7308_RS03550 FSC454_RS07985 CGC43_RS07580
N0.HOG0000085 85.0 glycosyltransferase family 2 protein F7308_RS04565 FSC454_RS07330 CGC43_RS02000
N0.HOG0000086 86.0 glycosyltransferase family 2 protein FSC454_RS03915 CGC43_RS03130
N0.HOG0000095 95.0 outer membrane beta-barrel protein F7308_RS10125 FSC454_RS09920 CGC43_RS09175
N0.HOG0000096 96.0 class A beta-lactamase F7308_RS06050 FSC454_RS06185
N0.HOG0000096 96.1 class A beta-lactamase F7308_RS07675 FSC454_RS06710 CGC43_RS05145
N0.HOG0000097 97.0 ATP-binding cassette domain-containing protein F7308_RS08020 FSC454_RS06375 CGC43_RS05645
N0.HOG0000103 103.0 acyl carrier protein F7308_RS04905 FSC454_RS03235 CGC43_RS02290
N0.HOG0000105 105.0 aspartate carbamoyltransferase F7308_RS00095 FSC454_RS00070 CGC43_RS00050
N0.HOG0000106 106.0 hypothetical protein F7308_RS00480 FSC454_RS00460
N0.HOG0000107 107.0 YoaK family protein F7308_RS00410 FSC454_RS00370 CGC43_RS00325
N0.HOG0000108 108.0 aromatic amino acid transport family protein F7308_RS00415 FSC454_RS00375 CGC43_RS00330
N0.HOG0000109 109.0 NAD-dependent succinate-semialdehyde dehydrogenase F7308_RS00570 FSC454_RS00555 CGC43_RS00465
N0.HOG0000110 110.0 ABC transporter permease subunit F7308_RS00645 FSC454_RS00630 CGC43_RS00520
N0.HOG0000111 111.0 amidohydrolase family protein FSC454_RS06805 CGC43_RS05235
N0.HOG0000112 112.0 NAD(P)H:quinone oxidoreductase F7308_RS00875 FSC454_RS00865 CGC43_RS00795
N0.HOG0000113 113.0 alpha-hydroxy acid oxidase F7308_RS01025 FSC454_RS01035
N0.HOG0000114 114.0 cysteine synthase family protein F7308_RS05090 FSC454_RS03445 CGC43_RS02475
N0.HOG0000115 115.0 class II fumarate hydratase F7308_RS01040 FSC454_RS01050 CGC43_RS01015
N0.HOG0000118 118.0 YciI family protein F7308_RS04895 FSC454_RS03225 CGC43_RS02280
N0.HOG0000119 119.0 site-specific tyrosine recombinase XerD F7308_RS03265 FSC454_RS08140 CGC43_RS07715
N0.HOG0000121 121.0 SulP family inorganic anion transporter F7308_RS08435 FSC454_RS02915
N0.HOG0000122 122.0 hypothetical protein F7308_RS01640 FSC454_RS01455 CGC43_RS07385
N0.HOG0000123 123.0 alanine racemase F7308_RS08130 FSC454_RS04715 CGC43_RS05750
N0.HOG0000124 124.0 APC family permease F7308_RS06770 FSC454_RS05440 CGC43_RS04415
N0.HOG0000125 125.0 ATP-binding cassette domain-containing protein F7308_RS01740 FSC454_RS01545 CGC43_RS07305
N0.HOG0000126 126.0 sulfite exporter TauE/SafE family protein F7308_RS01940 FSC454_RS01765 CGC43_RS07145
N0.HOG0000128 128.0 S-(hydroxymethyl)glutathione dehydrogenase/class III alcohol dehydrogenase F7308_RS02085 FSC454_RS01920 CGC43_RS06975
N0.HOG0000129 129.0 PepSY domain-containing protein F7308_RS02950 FSC454_RS08490 CGC43_RS08025
N0.HOG0000130 130.0 helix-turn-helix transcriptional regulator F7308_RS05240 FSC454_RS03575 CGC43_RS02775
N0.HOG0000131 131.0 MFS transporter F7308_RS08460 FSC454_RS02890 CGC43_RS06030
N0.HOG0000132 132.0 amidophosphoribosyltransferase F7308_RS02510 FSC454_RS08925 CGC43_RS08360
N0.HOG0000133 133.0 hypothetical protein F7308_RS02535 FSC454_RS08895
N0.HOG0000134 134.0 MFS transporter F7308_RS02830 FSC454_RS08560 CGC43_RS08095
N0.HOG0000135 135.0 hypothetical protein F7308_RS02910 FSC454_RS10095, FSC454_RS03065
N0.HOG0000137 137.0 efflux RND transporter permease subunit F7308_RS03030 FSC454_RS08415 CGC43_RS07950
N0.HOG0000140 140.0 glycine C-acetyltransferase F7308_RS08550 FSC454_RS02835 CGC43_RS02740
N0.HOG0000143 143.0 DegT/DnrJ/EryC1/StrS family aminotransferase F7308_RS04285 CGC43_RS01845
N0.HOG0000144 144.0 glucosyltransferase domain-containing protein FSC454_RS08120 CGC43_RS07705
N0.HOG0000145 145.0 bifunctional UDP-N-acetylglucosamine diphosphorylase/glucosamine-1-phosphate N-acetyltransferase GlmU F7308_RS09270 FSC454_RS02260 CGC43_RS06610
N0.HOG0000146 146.0 ATP-binding cassette domain-containing protein F7308_RS08445 FSC454_RS02905 CGC43_RS06015
N0.HOG0000147 147.0 glycosyltransferase family 1 protein F7308_RS05575 FSC454_RS03925 CGC43_RS03140
N0.HOG0000148 148.0 MFS transporter F7308_RS03025 CGC43_RS07955
N0.HOG0000150 150.0 DotU family type IV/VI secretion system protein F7308_RS05015 FSC454_RS03355 CGC43_RS02400
N0.HOG0000151 151.0 type VI secretion system baseplate subunit TssF/IglH F7308_RS05020 FSC454_RS03360 CGC43_RS02405
N0.HOG0000152 152.0 type VI secretion system lipoprotein IglE F7308_RS05040 FSC454_RS03380 CGC43_RS02425
N0.HOG0000153 153.0 DEAD/DEAH box helicase F7308_RS06675 FSC454_RS05340 CGC43_RS04355
N0.HOG0000154 154.0 KpsF/GutQ family sugar-phosphate isomerase F7308_RS05520 FSC454_RS03845 CGC43_RS03025
N0.HOG0000155 155.0 MFS transporter F7308_RS05630 FSC454_RS04100 CGC43_RS03190
N0.HOG0000156 156.0 nuclease-related domain-containing protein F7308_RS05770 FSC454_RS04235 CGC43_RS03330
N0.HOG0000158 158.0 hypothetical protein F7308_RS05820, F7308_RS05825 FSC454_RS04280
N0.HOG0000159 159.0 sugar porter family MFS transporter F7308_RS07425 FSC454_RS06865
N0.HOG0000160 160.0 LysR substrate-binding domain-containing protein F7308_RS06420 FSC454_RS05040 CGC43_RS03825
N0.HOG0000161 161.0 FUSC family protein F7308_RS00430 FSC454_RS00390 CGC43_RS00345
N0.HOG0000164 164.0 alpha/beta hydrolase fold domain-containing protein F7308_RS07040 FSC454_RS05795 CGC43_RS04020
N0.HOG0000165 165.0 bifunctional methionine sulfoxide reductase B/A protein F7308_RS07290 FSC454_RS04835 CGC43_RS04635
N0.HOG0000166 166.0 cation:proton antiporter F7308_RS07730 FSC454_RS06765 CGC43_RS05200
N0.HOG0000167 167.0 OmpA family protein F7308_RS08075 FSC454_RS04765 CGC43_RS05700
N0.HOG0000168 168.0 SprT family zinc-dependent metalloprotease F7308_RS08110 FSC454_RS04730 CGC43_RS05735
N0.HOG0000169 169.0 APC family permease F7308_RS08125 FSC454_RS04720 CGC43_RS05745
N0.HOG0000171 171.0 preprotein translocase subunit SecA F7308_RS08215 FSC454_RS06955 CGC43_RS05830
N0.HOG0000172 172.0 prepilin-type N-terminal cleavage/methylation domain-containing protein F7308_RS08250 FSC454_RS06995 CGC43_RS05870
N0.HOG0000173 173.0 L-threonine 3-dehydrogenase F7308_RS08555 FSC454_RS02830 CGC43_RS02735
N0.HOG0000174 174.0 helix-turn-helix domain-containing protein F7308_RS08840 FSC454_RS02545 CGC43_RS06195
N0.HOG0000175 175.0 Na+/H+ antiporter NhaA F7308_RS09635 FSC454_RS09210 CGC43_RS08800
N0.HOG0000176 176.0 ion channel F7308_RS07655 FSC454_RS06690 CGC43_RS05125
N0.HOG0000177 177.0 prepilin-type N-terminal cleavage/methylation domain-containing protein F7308_RS01620 FSC454_RS01430 CGC43_RS07410
N0.HOG0000178 178.0 hypothetical protein F7308_RS01625 FSC454_RS01435 CGC43_RS07405
N0.HOG0000179 179.0 ATP-binding protein FSC454_RS10030 CGC43_RS01425
N0.HOG0000180 180.0 hypothetical protein FSC454_RS09025, FSC454_RS09030 CGC43_RS08450
N0.HOG0000181 181.0 ATP-grasp domain-containing protein FSC454_RS07885 CGC43_RS01370
N0.HOG0000184 184.0 isochorismatase family protein FSC454_RS06785 CGC43_RS05215
N0.HOG0000226 226.0 transglutaminase family protein F7308_RS00165 FSC454_RS00145
N0.HOG0000337 337.0 putative basic amino acid antiporter YfcC F7308_RS01330 FSC454_RS01285
N0.HOG0000371 371.0 lytic polysaccharide monooxygenase F7308_RS05680 FSC454_RS04150
N0.HOG0000383 383.0 type I restriction endonuclease subunit R F7308_RS01500 CGC43_RS07490
N0.HOG0000499 499.0 lysine-sensitive aspartokinase 3 F7308_RS09360 FSC454_RS09085
N0.HOG0000544 544.0 FAD-dependent oxidoreductase F7308_RS02980 FSC454_RS08460
N0.HOG0000860 860.0 hypothetical protein F7308_RS10205 FSC454_RS04995
N0.HOG0000920 920.0 GTP-binding protein F7308_RS06890 FSC454_RS05590
N0.HOG0000923 923.0 TIM-barrel domain-containing protein F7308_RS06940 FSC454_RS05680
N0.HOG0000966 966.0 multidrug effflux MFS transporter F7308_RS07285 FSC454_RS04840
N0.HOG0001174 1174.0 aromatic amino acid transport family protein F7308_RS08030 FSC454_RS06385
N0.HOG0001211 1211.0 LysR family transcriptional regulator F7308_RS00150 FSC454_RS00130
N0.HOG0001262 1262.0 peptide MFS transporter F7308_RS08740 FSC454_RS02655
N0.HOG0001318 1318.0 MFS transporter F7308_RS06935 FSC454_RS05675
N0.HOG0001326 1326.0 N-acetyltransferase FSC454_RS00705 CGC43_RS00575
N0.HOG0001329 1329.0 arginine deiminase-related protein FSC454_RS07550 CGC43_RS01685
N0.HOG0001331 1331.0 MFS transporter FSC454_RS06980 CGC43_RS05855
N0.HOG0001332 1332.0 restriction endonuclease subunit S FSC454_RS04440 CGC43_RS05540
Number of HOGs refined: 117 for a total refinement of 124
Hello! I'm trying to make a test of this tool with a small dataset, but I got a message of 'Segmentation fault' and I don't know what's going, could you help me to solve it?
./orthorefine.exe --input input.txt --OF_file N0.tsv --window_size 8 --synteny_ratio 0.5 Segmentation fault
input.txt
Head of N0.tsv: HOG OG Gene Tree Parent Clade F.hispaniensis F.opportunistica-142155 F.salina N0.HOG0000000 OG0000000 n0 lcl|NZ_CP018093.1_prot_FSC454_RS07665_1515, lcl|NZ_CP018093.1_prot_FSC454_RS03685_724 lc>N0.HOG0000001 OG0000001 n0 lcl|NZ_CP022375.1_prot_WP_198150407.1_912, lcl|NZ_CP022375.1_prot_WP_198150407.1_1283, lcl>N0.HOG0000002 OG0000002 n1 lcl|NZ_CP018093.1_prot_WP_197456248.1_1581, lcl|NZ_CP018093.1_prot_WP_197456248.1_582 lc>N0.HOG0000003 OG0000002 n3 lcl|NZ_CP018093.1_prot_WP_156470857.1_1013, lcl|NZ_CP018093.1_prot_WP_244148270.1_1829, lc>N0.HOG0000004 OG0000003 n0 lcl|NZ_CP018093.1_prot_FSC454_RS02585_506, lcl|NZ_CP018093.1_prot_FSC454_RS00585_115, lcl|>N0.HOG0000005 OG0000004 n1 lcl|NZ_CP018093.1_prot_WP_066046703.1_444 lcl|NZ_CP022375.1_prot_WP_071629540.1_1295>N0.HOG0000006 OG0000004 n4 lcl|NZ_CP018093.1_prot_WP_014715333.1_1297 lcl|NZ_CP022375.1_prot_WP_071629258.1_988 >N0.HOG0000007 OG0000004 n6 lcl|NZ_CP018093.1_prot_WP_003033846.1_1161 lcl|NZ_CP022375.1_prot_WP_071629064.1_771 >N0.HOG0000008 OG0000005 n0 lcl|NZ_CP018093.1_prot_FSC454_RS10000_1199, lcl|NZ_CP018093.1_prot_FSC454_RS06065_1201, lc>N0.HOG0000009 OG0000006 n0 lcl|NZ_CP018093.1_prot_WP_071794787.1_854, lcl|NZ_CP018093.1_prot_WP_071794788.1_855, lcl|>N0.HOG0000010 OG0000007 n1 lcl|NZ_CP018093.1_prot_WP_066046330.1_168 lcl|NZ_CP022375.1_prot_WP_071628521.1_157 >N0.HOG0000011 OG0000007 n3 lcl|NZ_CP018093.1_prot_WP_066045340.1_1632 lcl|NZ_CP022375.1_prot_WP_071629755.1_1531>N0.HOG0000012 OG0000008 n0 lcl|NZ_CP018093.1_prot_FSC454_RS01480_292, lcl|NZ_CP018093.1_prot_WP_231865178.1_1110, lcl>N0.HOG0000013 OG0000009 n1 lcl|NZ_CP018093.1_prot_FSC454_RS05915_1169 lcl|NZ_CP022375.1_prot_WP_071629056.1_763 >N0.HOG0000014 OG0000009 n3 lcl|NZ_CP018093.1_prot_WP_066045627.1_372 lcl|NZ_CP022375.1_prot_WP_071629606.1_1370>N0.HOG0000015 OG0000010 n0 lcl|NZ_CP018093.1_prot_WP_066046902.1_1055, lcl|NZ_CP018093.1_prot_WP_080555366.1_704 lc>N0.HOG0000016 OG0000011 n0 lcl|NZ_CP018093.1_prot_FSC454_RS03655_718 lcl|NZ_CP022375.1_prot_WP_071628872.1_552 >N0.HOG0000017 OG0000012 n0 lcl|NZ_CP018093.1_prot_WP_066044704.1_770 lcl|NZ_CP022375.1_prot_WP_071628925.1_609 >N0.HOG0000018 OG0000013 n1 lcl|NZ_CP018093.1_prot_WP_156860530.1_1116, lcl|NZ_CP018093.1_prot_WP_066045399.1_1117 lc>N0.HOG0000019 OG0000013 n3 lcl|NZ_CP018093.1_prot_WP_156860531.1_1542, lcl|NZ_CP018093.1_prot_WP_156860529.1_1115, lc>N0.HOG0000020 OG0000014 n1 lcl|NZ_CP018093.1_prot_WP_066045288.1_1653 lcl|NZ_CP022375.1_prot_WP_071629770.1_1546>N0.HOG0000021 OG0000014 n3 lcl|NZ_CP018093.1_prot_WP_244148253.1_1179 lcl|NZ_CP022375.1_prot_WP_071629238.1_963,>N0.HOG0000022 OG0000015 n1 lcl|NZ_CP018093.1_prot_WP_066046621.1_1788 lcl|NZ_CP022375.1_prot_WP_071629866.1_1651>N0.HOG0000023 OG0000015 n3 lcl|NZ_CP018093.1_prot_WP_066046618.1_1787 lcl|NZ_CP022375.1_prot_WP_071629865.1_1650>N0.HOG0000024 OG0000016 n1 lcl|NZ_CP018093.1_prot_WP_066046651.1_1806 lcl|NZ_CP022375.1_prot_WP_071629921.1_1711>N0.HOG0000025 OG0000016 n4 lcl|NZ_CP018093.1_prot_WP_014549092.1_1805 lcl|NZ_CP022375.1_prot_WP_071629920.1_1710>N0.HOG0000026 OG0000017 n1 lcl|NZ_CP018093.1_prot_WP_066045151.1_48 lcl|NZ_CP022375.1_prot_WP_071628416.1_40 >