Open paul-bio opened 2 years ago
Hello @paul-bio,
Thank you so much for your interest in finder
. You have to download the genome file from this link -> http://ftp.ensemblgenomes.org/pub/plants/release-52/fasta/arabidopsis_thaliana/dna/Arabidopsis_thaliana.TAIR10.dna_sm.toplevel.fa.gz. Currently, I am a bit busy with my work. I will make changes to the README file later. Also, please make sure you enter the correct location of the directory for the dummy data in metadata.csv file. It should be the location where you performed the git pull.
Please let us know if you encounter any issues while running the software.
Thank you
Thanks. I downloaded genome data from the link you mentioned.
And suddenly got error message as below
cp: cannot stat ‘Arabidopsis_thaliana.TAIR10.dna_sm.toplevel.fa’: No such file or directory
[E::fai_build3_core] Failed to open the file /NFS2/users/creo9447/finder_tutorial/1.raw_data/FINDER/Arabidopsis_thaliana.TAIR10.dna_sm.toplevel.fa
[faidx] Could not build fai index /NFS2/users/creo9447/finder_tutorial/1.raw_data/FINDER/Arabidopsis_thaliana.TAIR10.dna_sm.toplevel.fa.fai
Traceback (most recent call last):
File "/NFS2/users/creo9447/software/Finder/Finder-finder_v1.1.0/finder", line 688, in
What should I do?
++And it seems like in Finder/example/raw_data/ path, there were two fastq files (dummy_data1.fastq and dummy_data2.fastq.gz). And data1.fastq seems normal but data2.fastq.gz files doesn't seems to contain fastq contents.
++And this is the code I used $finder -mf metadata.xlsx --output_dir ./FINDER --genome Arabidopsis_thaliana.TAIR10.dna_sm.toplevel.fa --organism_model PLANTS --genemark_path /software/Finder/geneMark/gmes_linux_64/ --genemark_license /software/Finder/GeneMark/gm_key_64 --cpu 32 --no_cleanup --protein uniprot_ARATH.fasta
Hello Paul,
Thanks for posting the error message. There are a few tweaks you need to make to the command that you are running. Firstly, it should be metadata.csv
, not metadata.xlsx
. Secondly, could you please make sure that the genome is in fact present in the current directory? I always provide the whole path (from /) to a program. That way I can execute it from anywhere and not really worry about changing anything. It seems that finder
cannot locate the genome file. Thirdly, the command you should execute is run_finder
and not finder
. run_finder
will check your system for the presence of docker or singularity. Depending on which one is available it will execute finder
.
data2.fastq.gz
is a compressed fastq file. If you open it up you will see some garbled text which is expected since the data is binary format not human-readable. I included that in the example run to demonstrate that finder
can work with compressed RNA-Seq samples as well.
Please let me know if you run into any further problems.
Thank you.
Thank you for your help. As you mentioned, typing full path seems working now. And also I changed metadata with csv format. But found this error massage.
cat: /NFS2/users/creo9447/finder_tutorial/FINDER/alignments/dummy_round1_SJ.out.tab: No such file or directory
cat: /NFS2/users/creo9447/finder_tutorial/FINDER/alignments/dummy_round2_SJ.out.tab: No such file or directory
mv: cannot stat ‘/NFS2/users/creo9447/finder_tutorial/FINDER/alignments/dummy_final_Unmapped.out.mate1’: No such file or directory
mv: cannot stat ‘/NFS2/users/creo9447/finder_tutorial/FINDER/alignments/dummy_final_Log.final.out’: No such file or directory
cat: /NFS2/users/creo9447/finder_tutorial/FINDER/alignments/dummy_round3_SJ.out.tab: No such file or directory
samtools index: "/NFS2/users/creo9447/finder_tutorial/FINDER/alignments/dummy_final.sortedByCoord.out.bam" is in a format that cannot be usefully indexed
samtools index: "/NFS2/users/creo9447/finder_tutorial/FINDER/alignments/dummy_final.sortedByCoord.out.bam" is in a format that cannot be usefully indexed
sh: junc: command not found
sh: subexon-info: command not found
[main_samview] fail to read the header from "/NFS2/users/creo9447/finder_tutorial/FINDER/alignments/dummy_final.sortedByCoord.out.bam".
[main_samview] fail to read the header from "/NFS2/users/creo9447/finder_tutorial/FINDER/alignments/dummy_for_psiclass.sam".
mv: cannot stat ‘/NFS2/users/creo9447/finder_tutorial/FINDER/assemblies_psiclass_modified/combined/psiclass_output_sample_0.gtf’: No such file or directory
mv: cannot stat ‘/NFS2/users/creo9447/finder_tutorial/FINDER/assemblies_psiclass_modified/combined/psiclass_output_vote.gtf’: No such file or directory
Traceback (most recent call last):
File "/NFS2/users/creo9447/software/Finder/Finder-finder_v1.1.0/finder", line 688, in
protein, genome, metadata, fastq files are all in same directory (finder_tutorial). Since i am running FINDER with local linux, I couldn't use Docker... And those are the metadata.csv I used metadata.CSV
Can you suggest any help? Thank you :)
Hello @paul-bio,
Thank you for posting this. Could you please confirm that you have either singularity or docker installed on your system? Please paste the outputs of which docker
and which singularity
. Are you running this on your personal computer or on a computational cluster?
Thank you.
I am currently running on a computational cluster. And I am used to work in conda environment, but there were none.
First I downloaded FINDER with wget as below $wget https://github.com/sagnikbanerjee15/Finder/archive/refs/tags/finder_v1.1.0.tar.gz
since run_finder works with docker, I used finder instead
And get GeneMart key and files from website.
Thank you.
Actually, the docker image has all the software preinstalled. finder
will not work on a conda environment since that is not maintained anymore.
Hope that helps!
Thank you.
Hello again. I installed docker and found error message...
This is the commend $run_finder -mf /NFS2/users/creo9447/software/FINDER/example/Arabidopsis_thaliana_metadata.csv -out_dir /NFS2/users/creo9447/software/FINDER/example/FINDER -g /NFS2/users/creo9447/software/FINDER/example/Arabidopsis_thaliana.TAIR10.dna_sm.toplevel.fa --protein /NFS2/users/creo9447/software/FINDER/example/uniprot_ARATH.fasta -om PLANTS --genemark_path /NFS2/users/creo9447/software/GeneMark/gmes_linux_64/ --genemark_license /NFS2/users/creo9447/software/GeneMark/gm_key_64 --cpu 32 --framework docker
And this is the log
Trying to pull repository docker.io/sagnikbanerjee15/finder ...
1.1.0: Pulling from docker.io/sagnikbanerjee15/finder
Digest: sha256:9816d258d2421d4625983c929f508b1f577cfe7ab3bc2042e841647a186c7931
Status: Image is up to date for docker.io/sagnikbanerjee15/finder:1.1.0
done
cat: /NFS2/users/creo9447/software/FINDER/example/FINDER/alignments/dummy_data1_round1_SJ.out.tab: No such file or directory
cat: /NFS2/users/creo9447/software/FINDER/example/FINDER/alignments/dummy_data1_round2_SJ.out.tab: No such file or directory
mv: cannot stat '/NFS2/users/creo9447/software/FINDER/example/FINDER/alignments/dummy_data1_final_Unmapped.out.mate1': No such file or directory
mv: cannot stat '/NFS2/users/creo9447/software/FINDER/example/FINDER/alignments/dummy_data1_final_Log.final.out': No such file or directory
cat: /NFS2/users/creo9447/software/FINDER/example/FINDER/alignments/dummy_data1_round3_SJ.out.tab: No such file or directory
samtools index: "/NFS2/users/creo9447/software/FINDER/example/FINDER/alignments/dummy_data1_final.sortedByCoord.out.bam" is in a format that cannot be usefully indexed
samtools index: "/NFS2/users/creo9447/software/FINDER/example/FINDER/alignments/dummy_data1_final.sortedByCoord.out.bam" is in a format that cannot be usefully indexed
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
Can not open /NFS2/users/creo9447/software/FINDER/example/FINDER/alignments/dummy_data1_final.sortedByCoord.out.bam.
[main_samview] fail to read the header from "/NFS2/users/creo9447/software/FINDER/example/FINDER/alignments/dummy_data1_final.sortedByCoord.out.bam".
[main_samview] fail to read the header from "/NFS2/users/creo9447/software/FINDER/example/FINDER/alignments/dummy_data1_for_psiclass.sam".
mv: cannot stat '/NFS2/users/creo9447/software/FINDER/example/FINDER/assemblies_psiclass_modified/combined/psiclass_output_sample_0.gtf': No such file or directory
mv: cannot stat '/NFS2/users/creo9447/software/FINDER/example/FINDER/assemblies_psiclass_modified/combined/psiclass_output_vote.gtf': No such file or directory
Traceback (most recent call last):
File "/softwares/FINDER/Finder/finder", line 688, in
What might be the problem? Sorry for keep bothering you ... Thanks anyway
Hello @paul-bio,
Could you please try again after removing the output directory? Also, please send me the contents of the file /NFS2/users/creo9447/software/FINDER/example/Arabidopsis_thaliana_metadata.csv
Thank you.
Sorry for keep bothering you. sagnikbanerjee.
here is the meta data file i used
Arabidopsis_thaliana_metadata.csv
And when I remove output file and a shorter error massage came like this
Trying to pull repository docker.io/sagnikbanerjee15/finder ...
1.1.0: Pulling from docker.io/sagnikbanerjee15/finder
Digest: sha256:9816d258d2421d4625983c929f508b1f577cfe7ab3bc2042e841647a186c7931
Status: Image is up to date for docker.io/sagnikbanerjee15/finder:1.1.0
done
rm: cannot remove '/NFS2/users/creo9447/software/FINDER/example/FINDER/assemblies_psiclassmodified/combined/outputfileforCPD*': No such file or directory
Traceback (most recent call last):
File "/softwares/FINDER/Finder/finder", line 688, in
Thank you
Hello @paul-bio,
Thanks for your reply. And please do not hesitate to ask questions. Feedback like this will make finder
even better.
I looked at the metadata.csv file and it seems everything is correct. Is there any reason why you are running the example with only one RNA-Seq dataset?
Could you send me the output of the command ls -lhrt /NFS2/users/creo9447/software/FINDER/example/FINDER/alignments
? And also the progress.log file.
Thank you.
Thanks.
I am trying to get used to this tool. And l am planning to use it with my project. So as a initial step toward learning this tool I am analyzing with data downloaded from this github. In the raw_data, there were two files. dummy_data1.fastq and dummy_fastq2.gz. And since I have no evidence of the sequencing data, first I thought those two files can be two different single end files. So I only used dummy_data1.fastq file. Is it paired-end file?
Anyway, I copied the output of the alignmnets and progresslog.file. progress.log
$ls -lhrt "/NFS2/users/creo9447/software/FINDER/example/FINDER/alignments/" total 440K -rw-r--r--. 1 root root 0 Feb 11 22:08 dummy_data1_round1.error -rw-r--r--. 1 root root 772 Feb 11 22:08 dummy_data1_round1_SJ.out.tab -rw-r--r--. 1 root root 2.0K Feb 11 22:08 dummy_data1_round1_Log.final.out -rw-r--r--. 1 root root 1.2K Feb 11 22:08 dummy_data1_round1.output -rw-r--r--. 1 root root 0 Feb 11 22:08 leaf_round1_SJ.out.tab -rw-r--r--. 1 root root 0 Feb 11 22:08 dummy_data1_round2.error -rw-r--r--. 1 root root 0 Feb 11 22:08 dummy_data1_round2_SJ.out.tab -rw-r--r--. 1 root root 2.0K Feb 11 22:08 dummy_data1_round2_Log.final.out -rw-r--r--. 1 root root 1.4K Feb 11 22:08 dummy_data1_round2.output -rw-r--r--. 1 root root 0 Feb 11 22:08 leaf_round2_SJ.out.tab -rw-r--r--. 1 root root 0 Feb 11 22:08 leaf_round1_and_round2_SJ.out.tab -rw-r--r--. 1 root root 0 Feb 11 22:08 dummy_data1_round3.error -rw-r--r--. 1 root root 0 Feb 11 22:08 dummy_data1_round3_SJ.out.tab -rw-r--r--. 1 root root 2.0K Feb 11 22:09 dummy_data1_round3_Log.final.out -rw-r--r--. 1 root root 1.2K Feb 11 22:09 dummy_data1_round3.output -rw-r--r--. 1 root root 0 Feb 11 22:09 leaf_round3_SJ.out.tab -rw-r--r--. 1 root root 0 Feb 11 22:09 leaf_round1_and_round2_and_round3_SJ.out.tab -rw-r--r--. 1 root root 261 Feb 11 22:09 dummy_data1_round5.error -rw-r--r--. 1 root root 21K Feb 11 22:09 dummy_data1_final.sortedByCoord.out.bam -rw-r--r--. 1 root root 56K Feb 11 22:09 dummy_data1_final.sortedByCoord.out.bam.bai -rw-r--r--. 1 root root 873 Feb 11 22:09 dummy_data1_final.sortedByCoord.out.bam.csi -rw-r--r--. 1 root root 748 Feb 11 22:09 dummy_data1_introns -rw-r--r--. 1 root root 460 Feb 11 22:09 dummy_data1_introns.bed -rw-r--r--. 1 root root 3.5K Feb 11 22:09 dummy_data1_exons -rw-r--r--. 1 root root 1.1K Feb 11 22:09 dummy_data1_exons.bed -rw-r--r--. 1 root root 595 Feb 11 22:09 dummy_data1_num_exons_in_intron -rw-r--r--. 1 root root 94K Feb 11 22:09 dummy_data1_final.sortedByCoord.out.sam -rw-r--r--. 1 root root 94K Feb 11 22:09 dummy_data1_for_psiclass.sam -rw-r--r--. 1 root root 21K Feb 11 22:09 dummy_data1_for_psiclass.bam -rw-r--r--. 1 root root 257 Feb 11 22:09 mapping_stats.csv -rw-r--r--. 1 root root 0 Feb 11 22:09 dummy_data1_counts.output -rw-r--r--. 1 root root 0 Feb 11 22:09 dummy_data1_counts.error -rw-r--r--. 1 root root 8.1K Feb 11 22:09 dummy_data1_counts_genome_cov.bed -rw-r--r--. 1 root root 20 Feb 11 22:09 dummy_data1_counts_all_info.pkl -rw-r--r--. 1 root root 56K Feb 11 22:09 dummy_data1_for_psiclass.bam.bai -rw-r--r--. 1 root root 874 Feb 11 22:09 dummy_data1_for_psiclass.bam.csi -rw-r--r--. 1 root root 0 Feb 11 22:09 dummy_data1_SJ_regtools.bed.output -rw-r--r--. 1 root root 341 Feb 11 22:09 dummy_data1_SJ_regtools.bed.error -rw-r--r--. 1 root root 1.8K Feb 11 22:09 dummy_data1_SJ_regtools.bed
Thanks.
Hello @paul-bio,
Thank you for deciding to use finder
in your project. I would recommend that you use the entire metadata.csv file and not just the dummy data. The dummy data contain very few reads which are not enough to generate any annotations. The reason for including those data is to ensure that the pipeline can process locally available data. Don't worry about not having the rest of the data. finder
will automatically download those from NCBI SRA. This is the command you should try:
# Remove the output directory
rm -rf /NFS2/users/creo9447/software/FINDER/example/FINDER
# Run the program with the entire metadata
$run_finder -mf /NFS2/users/creo9447/software/FINDER/example/Arabidopsis_thaliana_metadata.csv -out_dir /NFS2/users/creo9447/software/FINDER/example/FINDER -g /NFS2/users/creo9447/software/FINDER/example/Arabidopsis_thaliana.TAIR10.dna_sm.toplevel.fa --protein /NFS2/users/creo9447/software/FINDER/example/uniprot_ARATH.fasta -om PLANTS --genemark_path /NFS2/users/creo9447/software/GeneMark/gmes_linux_64/ --genemark_license /NFS2/users/creo9447/software/GeneMark/gm_key_64 --cpu 32 --framework docker
Please let me know if this works.
Thank you.
Hi again.
With command you suggested, still a error message came out.
And it seems there is a problem in .gm_key when running braker.
Here is the braker.error.txt
, progress.log
and the error message
I got.
braker.error.txt
progress.log
Trying to pull repository docker.io/sagnikbanerjee15/finder ... 1.1.0: Pulling from docker.io/sagnikbanerjee15/finder Digest: sha256:9816d258d2421d4625983c929f508b1f577cfe7ab3bc2042e841647a186c7931 Status: Image is up to date for docker.io/sagnikbanerjee15/finder:1.1.0 done rm: cannot remove '/NFS2/users/creo9447/software/FINDER/example/FINDER/assemblies_psiclassmodified/combined/outputfileforCPD*': No such file or directory Traceback (most recent call last): File "/softwares/FINDER/Finder/finder", line 688, in
main() File "/softwares/FINDER/Finder/finder", line 673, in main addBRAKERPredictions( options, logger_proxy, logging_mutex ) File "/softwares/FINDER/Finder/scripts/predictGenesUsingBRAKER.py", line 287, in addBRAKERPredictions fhr = open( options.output_assemblies_psiclass_terminal_exon_length_modified + "/proteins_comparison_gffcompare.proteins_for_alignment.gtf.refmap", "r" ) FileNotFoundError: [Errno 2] No such file or directory: '/NFS2/users/creo9447/software/FINDER/example/FINDER/assemblies_psiclass_modified/proteins_comparison_gffcompare.proteins_for_alignment.gtf.refmap'
I also noticed that in your github you said I have to get GeneMark-ES/ET/EP ver 4.62. However in the website link, a new version of 4.69 is currently available. Is it possible the error kept emerging is because of different version?
Hello @paul-bio,
Thanks for sending me the error files. I checked the progress.log and it seems like you did not use the metadata.csv file from the GitHub repo. It contains only the dummy data. Please rerun the program with the original metadata file. I don't think the version of GeneMark-ES/ET/EP would matter in this case.
Thank you.
HI @sagnikbanerjee15
I changed my metadata.csv. And this time I have lot more error than previous runs. error_message.txt
And used this metadata file. I don't think now there is no problem in metadata nor raw data. Arabidopsis_thaliana_metadata.csv
Thanks.
Hello @paul-bio,
Thanks for posting the error. The command looks good and so does the metadata file. I will need some time to figure out the problem. I will let you know when I am done.
Thank you.
Thanks @sagnikbanerjee15 I hope the problem fixed soon.
Thanks a lot.
Hi I am trying to predict gene structure using Finder. And it seems this tool is better than PASA, MAKER,,, So I am planning to get used to this tool.
However, the example data you shared, I could get metadata, protein data, and rawdata but could not find genome data. Where can I find the genome sequence?
Thanks a lot for us to use this beautiful tool. Sincerely, Paul.