iRNA-COSI / APAeval

Community effort to evaluate computational methods for the detection and quantification of poly(A) sites and estimating their differential usage across RNA-seq samples
MIT License
13 stars 14 forks source link

OpenEBench summary workflow: rename output consolidation file #152

Closed AsierGonzalez closed 3 years ago

AsierGonzalez commented 3 years ago

At the moment the output file is called {challenge_name}_summary.json (e.g. Q2_summary.json), which causes issue with the visualisation as the js code used expects the file to be called {challenge_name}.json (e.g. Q2.json). Could you please change the code in line 138 of manage_assessment_data.py to remove the "_summary" bit?

summary_dir = os.path.join(challenge_dir,challenge +"_summary.json" ".json")

I believe this is the only place in the code this filename is reference but I haven't checked thoroughly

AsierGonzalez commented 3 years ago

I believe that line 30 in merge_data_model_files.py also needs to be updated:

data_model_file = join_json_files(aggregation_dir, data_model_file, "*_summary.json" ".json")

yuukiiwa commented 3 years ago

Thanks, @AsierGonzalez! This issue is addressed here: https://github.com/yuukiiwa/APAeval-summary-workflow/commit/b84d3bd4dc7876c726d5586ff4913c3594809f4a

AsierGonzalez commented 3 years ago

Thank you @yuukiiwa. I see you have changed manage_assessment_data.py but not merge_data_model_files.py, do you think the latter is fine as it?

yuukiiwa commented 3 years ago

Yes, I have updated merge_data_model_files.py: https://github.com/yuukiiwa/APAeval-summary-workflow/commit/9dd1c3e4253f22004520e76c4fbd1a219cf8eba3 The docker container is also updated. Thank you!

AsierGonzalez commented 3 years ago

The changes had the desired effect - I'm closing the ticket. Thank you for the great work @yuukiiwa!

AsierGonzalez commented 3 years ago

I have just realised that there's something missing in the output and I believe that it's related to a missing asterisk in the second change I suggested. The call to join_json_files() in line 30 in merge_data_model_files.py should have *.json as the third argument instead of .json:

data_model_file = join_json_files(aggregation_dir, data_model_file, "*.json")

It was my bad, apologies.

yuukiiwa commented 3 years ago

No problem! Thanks for catching this too! I have updated it here: https://github.com/yuukiiwa/APAeval-summary-workflow/commit/28eda1d51d90d94c2bc6a6c5d6e146b40190ba93

AsierGonzalez commented 3 years ago

Unfortunately this change did not solve the issue because once the new file name does not contain "_summary" the "*.json" pattern matches more files than we need. Given that the file we need contains the name of the challenge, the solution is to expand the pattern with the challenge name, so that the call to join_json_files() looks something like this:

data_model_file = join_json_files(aggregation_dir, data_model_file, "*" + challenge + ".json")

This requires changes to the benchmark_consolidation process in main.nf as well as merge_data_model_files.py

AsierGonzalez commented 3 years ago

I have made the necessary changes and I have opened a PR that fixes this issue. I have tested the changes locally and in the OpenEBench infrastructure and it works as expected