Open FadelBerakdar opened 4 years ago
Solution: This FASTA splitter script works fine, please initialize path2MultiFasta, path2outputDir variables
`
import gzip
path2MultiFasta = ""
path2outputDir = ""
file = gzip.open(path2MultiFasta, "r")
seqName = ""
seqHeader = ""
seqBuffer = []
seqLength = 0
if path2outputDir[len(path2outputDir) - 1] != '/':
path2outputDir += '/'
# >mitochondrion_genome 19517
# mitochondrion_genome 19524
while True:
line = file.readline().decode('ascii')
if line == "":
break
if line[0] == '>':
if seqName != "":
print(seqName, seqLength)
fileName = path2outputDir + seqName[1:] + ".fa"
out = open(fileName, "w")
out.write(seqName)
out.write("\n")
for ln in seqBuffer:
out.write(ln)
out.close()
seqName = line.split(' ')[0]
seqHeader = line[:-1]
seqLength = 0
seqBuffer = []
else:
seqBuffer.append(line)
seqLength += len(line) - 1
if seqName != "":
print(seqName, seqLength)
fileName = path2outputDir + seqName[1:] + ".fa"
out = open(fileName, "w")
out.write(seqName)
out.write("\n")
for ln in seqBuffer:
out.write(ln)
out.close()`
Problem: Sequences names in FASTA files don't match their names in GTF file.
Log message:
Looking for FASTA references in genome_dir/References found :
[INFO] I am collecting information on the run. initializing profiler
[INFO] Reading error model 76 bases model [WARN] The error model supports a read length of 76 but you are trying to create reads of length 100. We are scaling.
[INFO] Checking GTF file [PROFILING] I am assigning the expression profile Reading reference annotation OK (00:00:00) found 0 transcripts
[PROFILING] Parameters NB_MOLECULES 5000000 EXPRESSION_K -0.6 EXPRESSION_X0 9500.0 EXPRESSION_X1 9.025E7 PRO_FILE_NAME /home/zingo/simulation/dataset/FluxSimulator/fluxSimulator.pro
[ERROR] Profiler has no molecules! java.lang.RuntimeException: Profiler has no molecules! at barna.flux.simulator.SimulationPipeline.call(SimulationPipeline.java:438) at barna.flux.simulator.SimulationPipeline.call(SimulationPipeline.java:54) at barna.commons.launcher.Flux.main(Flux.java:198)
Cannot open dataset/FluxSimulator/fluxSimulator.fastq at /home/zingo/perl5/lib/perl5/CracTools/Utils.pm line 551.