groupschoof / AHRD

High throughput protein function annotation with Human Readable Description (HRDs) and Gene Ontology (GO) Terms.
https://www.cropbio.uni-bonn.de/
Other
63 stars 21 forks source link

Input format and batcher mode issue #2

Closed sitaramrajaraman closed 9 years ago

sitaramrajaraman commented 9 years ago

I second biojon regarding the input format. I does not seem to work with blast+ rather only with blast normal version. I resorted to using the same version that you have used in your test cases.

I also notice that while running batcher mode (sample file: "batcher_input_test.yml"), the line "find_highest_possible_evaluation_score: true" is activated which seems to throw an error resulting in an empty file. I also noticed that this line was not present in the example "ahrd_example_input.yml". I do not know how it impacts the final result.

groupschoof commented 9 years ago

Dear Sitaramrajaraman,

thank you for your feedback.

Th issue Biojon brought up is being worked on. We are implementing a new parser, hopefully to be released early next year.

I just tested java -cp ./dist/ahrd.jar ahrd.controller.Batcher ./test/resources/batcher_input_test.yml without any errors. Please provide more information on what your command line call was.

sitaramrajaraman commented 9 years ago

Yes ok. So running the script you mentioned above generates a shell script file _start_ahrdbatched.sh While running this script which inturn points to the individual yml files, I get the following error:

Usage:
java -Xmx2g -jar ahrd.jar input.yml

Started AHRD...

...initialised proteins in 0sec, currently occupying 52 MB
...parsed blast results in 62sec, currently occupying 228 MB
...parsed gene ontology results in 0sec, currently occupying 228 MB
...assigned highestest scoring human readable descriptions in 0sec, currently occupying 278 MB
Writing output to '/archive/Work/Data/Birch/AHRD/test_folder/group_11_modified_ahrd_out.csv'.
We are sorry, an un-expected ERROR occurred:
java.lang.NullPointerException
    at ahrd.view.OutputWriter.buildHighestPossibleEvaluationScoreColumn(Unknown Source)
    at ahrd.view.OutputWriter.writeOutput(Unknown Source)
    at ahrd.controller.AHRD.main(Unknown Source)

Probing into this error lead me to this line which was present in the yml file

find_highest_possible_evaluation_score: true

I did not notice this line in the normal example yml file _ahrd_exampleinput.yml. Upon commenting out this line, the script ran successfully.

Usage:
java -Xmx2g -jar ahrd.jar input.yml

Started AHRD...

...initialised proteins in 0sec, currently occupying 52 MB
...parsed blast results in 61sec, currently occupying 714 MB
...parsed gene ontology results in 0sec, currently occupying 714 MB
...assigned highestest scoring human readable descriptions in 0sec, currently occupying 142 MB
Writing output to '/archive/Work/Data/Birch/AHRD/test_folder/group_11_modified_ahrd_out.csv'.
Wrote output in 0sec, currently occupying 142 MB

DONE

I'm not sure how this line impacts the final result but as of now, it works for me only if this line is commented out. This is my issue / question.

groupschoof commented 9 years ago

Dear Sitaramrajaraman,

thank you for your quick response. The problem actually arose, because the documentation has not been followed. Please read section 2.3 Batcher carefully. There the following example is given: java -cp ./dist/ahrd.jar ahrd.controller.Batcher ./batcher_input_example.yml The YAML input file serving as an example for the usage of the AHRD Batcher is ./batcher_input_example.yml and not ./test/resources/batcher_input_test.yml. The latter file is used for unit testing and hence has input data not intended for the actual application.

Section 2.3 of the manual actually also states

You will have to edit ./batcher_input_example.yml and provide the following arguments…

If you had followed these instructions the problem would not have arisen.

Have fun using AHRD. Cheers

sitaramrajaraman commented 9 years ago

Thank you for clarifying. I think I was using version 2.0 and not 2.0.1. My copy did not have _batcher_inputexample.yml . My README was like this:

h3. 2.2 Batcher

Start the Batcher with:
<pre>mkdir test/resources/batch_ymls 
java -cp ./dist/ahrd.jar ahrd.controller.Batcher ./test/resources/batcher_input_test.yml</pre>
You will have to edit ./test/resources/batcher_input_test.yml according to your needs.

I guess that was the confusion. But I will download this file and edit it according to my needs. Thanks once again.

groupschoof commented 9 years ago

Dear Sitaramrajaraman,

we are glad that this could be resolved.

Happy Holidays and much fun with AHRD.