Compare the different average precision calculation scripts

niranjchandrasekaran commented 1 year ago

For this exercise, I used 40 plates from CPJUMP1 (2 cell types x 2 time points x (4 compound + 4 CRISPR + 2ORF plates)). I chose these plates in part because it was a good-sized data to test the scripts, and in part because matric had previously been run on these plates.

Before running the notebooks in this repo, clone the data repo outside the root of this repo. Then run the create-parquet notebook to create a parquet file with profiles from all 40 plates.

I compared matric with copairs and two versions of my average precision scripts, one which does not vectorize the AP calculation, that was used in the current version of the CPJUMP1 paper repo and a new version that vectorizes the calculation.

Comparing the results

Notebook The average precision values from all the scripts are the same.

Execution time

I don't know how long it takes matric to run, but the following are the execution times of the other scripts.

Script	Execution time
Matric	not available
Copairs	~1 min
vectorized	~2.5 min
non-vectorized	~21.5 min

Features of each script

Features	Matric	Copairs	My vectorized script
Language	R	Python	Python
Speed	NA	Very fast	fast
Multiple matching columns	Yes	Requires some preprocessing	Yes
Multi label	Yes	Yes	Yes
Average precision calculation	Yes	Yes	Yes
p value	Yes	Yes	Yes
Adjusted average precision	Yes	No	Yes
Available as a package?	Yes	Yes	No
Optimized for speed?	Yes	Yes	No
Ease of use for python users in the lab	Difficult	Easy	Easy
Has it been extensively tested?	Yes	No	No

niranjchandrasekaran commented 1 year ago

@johnarevalo I compared the different scripts we currently have for AP calculation. I have summarized my observations and findings. I hope this will help us decide our strategy going forward. Let me know if I have gotten any details about copairs wrong. We can then talk to Alex and Shantanu about it.

niranjchandrasekaran commented 1 year ago

After discussing with Shantanu and Alex, we decided to use copairs for all the Morphmap analyses. John will create a notebook with an example showing to load and use copairs.

niranjchandrasekaran / compare_ap