kienerj / pycdxml

Tools to automatically convert and proccess cdx and cdxml files in python
GNU General Public License v3.0
35 stars 5 forks source link

Feature request: could CDXMLSlideGenerator add additional slidedecks when the number of molecules is too large #27

Open baoilleach opened 1 year ago

baoilleach commented 1 year ago

Hi @kienerj, I'm wondering what to do when I have so many properties that the molecule depiction starts to shrink, and separately, if I have too many molecules to fit on the slidedeck.

For the first, it's probably sufficient for me to just request a single row, but then I can't fit many molecules into the cdxml file generated. Which brings me to the second point, would you consider extending the size of the page to accommodate additional 'slidedecks' below the first one when the number of molecules is larger than the space provided?

kienerj commented 1 year ago

Yes currently it is very barebones and up to the caller to find a suitable cdxml size and perform iteration if the amount of molecules does not fit in one cdxml document.

The assumption is, that your cdxml has a fixed sized due to the target document it will be used in. Then you determine suitable amount of rows and columns and then iterate over your set of molecules. each call will generate 1 cdxml document only.

The question is, would you want CDXMLSlideGenerator to:

The later would be harder as it would be a different "algorithm". Plus it needs to be defined in which direction it can grow, probably only vertically?

The multiple documents could be I think relatively easily done as convenience method and simply returning a list of cdxml documents (strings). This would then be the "normal" use-case. I would still keep the current method exposed for "edge-cases" that process a lot of molecules and might not want to have them all stored in cdxml in memory but want to write each result to a file immediately.

baoilleach commented 1 year ago

I guess my use-case is a bit different than what the software is designed for. I don't expect the users to be copying and pasting the entire ChemDraw file into a slide on Powerpoint. Instead they will pick out molecules one-at-a-time (with their properties) and paste them into different slides or a few on this slide and a few on another and rearrange them in various ways. For this use-case, having a single document is much more convenient (growing vertically - I agree). I appreciate that it would be a change in approach. I can live with it as it is (for now I can just return a zip files of multiple cdxmls), but this would be a nice to have.

Regarding the convenience method, if you are worried about memory, you could maybe do this as a generator with yield, so the user could call next() and there would only ever be one file in memory at a time.

kienerj commented 1 year ago

In case of returning one document you would still want to define number of columns? eg number of molecules side-by side?

And is the shrinking an issue? Because avoiding shrinking would mean rows of different height as the chemical structure drawing will take up a variable amount of space. And no way to compute the documents size beforehand. it would also mean molecules in same row likley not being aligned.

What you could do right now over returning multiple files is simply defining a very large (=tall) slide and give each row enough space so no shrinking is needed and then fix them for production? eg empirically (trial and error) determine a suitable row height by amount of proprieties the user selected and by the size of your molecules.

kienerj commented 1 year ago

I have added a generate_document method. It simply acts as a slide of variable number of rows. The height of each row is determined from the instances slide_height and rows properties.

This way one document can be created but molecules are still shrinked if they do not fit. You could potentially calculate ahead the maximum space needed for the user selected molecules + properties and determine the row height that way. The issue is with very dissimilar molecules as in different vertical space usage of the drawing, there is no way to make it look well. Either some will be shrunk or others will have a lot of white-space.

You may also want to check the commit prior to fixing this, There is/was an issue with hetero atom labels getting cut-off or overlapping with text. I added some additional general margin so white-space will increase a bit in case you are wondering why.