pirovc / metameta

Other
23 stars 10 forks source link

Question on interpreting the final metametamerge output file. #11

Closed muslih14 closed 6 years ago

muslih14 commented 6 years ago

In the final final.metametamerge.profile.out.detailed file. What does the value -1 mean? I understand 0 is absent and between 0 and 1 are the abundance obtained using a specific tool.

pirovc commented 6 years ago

In the detailed reports -1 stands for "not in the database". For example: if a certain species is marked with -1 for a specific tool, this species is not available on this tool's database.

By default, metametamerge will just merge species results (parameter ranks on the configuration file,), so all other ranks (genus and above) will be also marked with -1, since they were just estimated from the species.

muslih14 commented 6 years ago

Thanks, That makes perfect sense. I've got another question: I'd like to assess the time it takes each tool to process the data, the log and time files seem a bit cyrptic.

  1. What do the .rpt files signify and 2. why are there multiple log and time files for each run ex. dudes_run_1.time dudes_run_2.time and so on. 3. Can I concatenate these files to have the total time spent by each tool processing the data ? 4. Is there a way to concatinate the final.metametamerge.profile.out.detailed from multiple samples into one just like how it can be done with the final.metametamerge.profile.out file ? ( the merge script doesn't work for the *detailed file)

Thank you so much for this tool.

pirovc commented 6 years ago

You're welcome, I hope the tool is being useful.

Follow the answers:

1 and 2: MetaMeta outputs a log file for each rule inside the pipeline. For the DUDes example, metameta has a rule to align the reads agains the index (dudes_run_1), for the analysis (dudes_run_2) and for converting the tool output to a specified standard (dudes_rpt) that would be parsed by metametamerge.

3: Yes, summing them up will give you the total execution time. You can do that with the following:

cd workdir/sample/log/
grep "^[0-9].*" *.time | cut -f 1 | awk -F ":" '{split($1,t,"_"); sum[t[1]]+=$2}END{for(s in sum) print s, sum[s]}'

4: There is no script to merge the detailed files yet. It should be easily done by reading those files using python and pandas (https://pandas.pydata.org/pandas-docs/stable/merging.html#set-logic-on-the-other-axes). I'll try to add a script for that on the next release.

Cheers, Vitor

muslih14 commented 6 years ago

It sure is. Thank you, Vitor!

Best wishes, Muslih.