Metadata reorganisation to improve reliability

Branching from #104 and the chats with @niranjchandrasekaran , @shntnu and @johnarevalo we came up with some steps to ensure that any ID we query can be associated with a set of images.

To ensure robustness of all dataset:
- [ ] Add CPG id to the PLATE and WELL tables metadata to ensure that each well has a unique path.
To deal with the missing JCP ids:
- [ ] ORF: Include related JUMP pilot in both PLATE and WELL tables.
- [ ] COMPOUNDS: Drop the 957 compounds that were not actually used. @afermg will do this.

Decision time:

Do we use this opportunity to change the format of files?
Pros:
- A parquet file, for instance, allows us to query columns independently of each other, obviating the need of downloading the whole dataset.
- An sqlite file is already compressed and can be used by most database software.
Cons:
- The main motivation of using csv.gz is to reduce friction from biologists who want to access the metadata. This can be alleviated by providing a WAssembly system that fetches it on their browsers (akin to broad.io/babel or Ank's DuckDB system).

Please let me know if you have any opinions on this because depending on our decision I may need to write a script to convert csv.gz into a different format.

jump-cellpainting / datasets

Metadata reorganisation to improve reliability #105