nel-lab / mesmerize-core

High level pandas-based API for batch analysis of Calcium Imaging data using CaImAn
Other
59 stars 15 forks source link

batch files cannot be used cross-platform #209

Open kushalkolar opened 1 year ago

kushalkolar commented 1 year ago

I think it's because the current batch path is stored in DataFrame.attrs

https://github.com/nel-lab/mesmerize-core/blob/f00949fa87886b1254def96a93e47e09526f0357/mesmerize_core/batch_utils.py#L59-L60

should just store this as a string and have a @batch_path property on the extension class which returns a Path instead of str to use within the extension functions.

ethanbb commented 3 months ago

Hey, I just ran into this issue as well.

I was thinking, your solution would avoid the error, but it's still not very useful to have the string representation of a path that's not usable on your current system. The batch path gets overwritten when loading a batch anyway right after un-pickling:

df = pd.read_pickle(Path(path))
df.paths.set_batch_path(path)

Thus how about this attribute just gets cleared before saving? Each df.to_pickle call would have to be replaced with a custom save function. If that sounds OK, I can take a stab at it.

ethanbb commented 3 months ago

By the way, how do you quote specific lines with the line numbers and link to file? I've been wishing there was a feature like that.

kushalkolar commented 3 months ago

not very useful to have the string representation of a path that's not usable on your current system. The batch path gets overwritten when loading a batch anyway right after un-pickling:

Ah yes you're right :laughing: nice to have fresh eyes take a look at this

By the way, how do you quote specific lines with the line numbers and link to file? I've been wishing there was a feature like that.

https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-a-permanent-link-to-a-code-snippet

ethanbb commented 3 months ago

I just realized that the outputs are also saved as Paths, e.g.:

https://github.com/nel-lab/mesmerize-core/blob/38f6ebebf6dc38bc5f9312751f2848952711807e/mesmerize_core/algorithms/cnmf.py#L118C9-L134C10

Since I believe these are all relative paths, which should be cross-platform if they use forward slashes, I think the solution is to convert the relative paths to PurePosixPaths and then strings before saving.

Edit: or just save as PurePosixPath, that should also be fine.

kushalkolar commented 3 months ago

That's a good idea! Posix path and then string.