Closed sarahpenir closed 8 years ago
Hi Sarah
Last night I had the same problem when I was trying to set up a virulence gene database for Salmonella. And I got an identical error message. Have you found a solution to the issue?
Many thanks, Yue
Hi @aphayt,
I was able to make the program work by modifying the "main" function of VFDB_cdhit_to_csv.py with the following code:
def main():
args = parse_args()
outfile = file(args.outfile,"w")
outfile.write("seqID,clusterid,gene,allele,DNA,annotation\n")
database = {} # key = clusterid, value = list of seqIDs
seq2cluster = {} # key = seqID, value = clusterid
for line in open(args.cluster_file):
if line.startswith(">"):
ClusterNr = line.split()[1]
continue
line_split = line.split(">")
seqID = line_split[1].split("(")[0]
if ClusterNr not in database:
database[ClusterNr] = []
if seqID not in database[ClusterNr]:
database[ClusterNr].append(seqID) # for virulence gene DB, this is the unique ID R0xxx
seq2cluster[seqID] = ClusterNr
for record in SeqIO.parse(open(args.infile, "r"), "fasta"):
clusterid = ""
full_name = record.description
genus = full_name.split("[")[2].split()[0]
id_bits = re.sub("[()]","",full_name.split("[")[0]).split() # 'R004852 fliL VP2243 '
seqID = full_name.split()[0].split("(")[0] # R004852
gene = id_bits[1] # fliL
if len(id_bits) > 2:
allele = id_bits[1]+"_"+id_bits[2] # fliL_VP2243
else:
allele = id_bits[1]
if seqID in seq2cluster:
clusterid = seq2cluster[seqID]
outstring = ",".join([seqID, clusterid, gene, allele, str(record.seq), re.sub(",","",record.description)]) + "\n"
outfile.write(outstring)
outfile.close()
Hope this helps, Sarah P.
Hi Sarah
Many thanks for sharing. I have made two VF databases: Campylobacter and Salmonella after following the steps in 'Error in step: Using the VFDB Virulence Factor Database with SRST2' #59.
Best, Yue
Good day,
Upon running the VFDB_cdhit_to_csv.py against my cluster file, the following error ensued:
"Traceback (most recent call last): File "../database_clustering/VFDB_cdhit_to_csv.py", line 67, in
sys.exit(main())
File "../database_clustering/VFDB_cdhit_to_csv.py", line 61, in main
outstring = ",".join([seqID, clusterid, gene, allele, str(record.seq), re.sub(",","",record.description)]) + "\n"
UnboundLocalError: local variable 'clusterid' referenced before assignment"
What could have caused the error?
Thank you very much, Sarah