Python: Library separate from application?

nickmckay / LiPD-utilities

Input/output and manipulation utilities for LiPD files in Matlab, R and Python

http://nickmckay.github.io/LiPD-utilities/

GNU General Public License v2.0

29 stars 9 forks source link

Python: Library separate from application? #39

Closed brews closed 6 years ago

brews commented 6 years ago

Is it possible for a general developer-oriented LiPD library to be packaged separately from end-user-oriented LiPD utilities and applications?

I may have use for LiPD within a much larger analysis system. It would be nice to have a library or framework to write project-specific applications that read/write/parse lipid files as a standardized object.

nickmckay commented 6 years ago

Hey Brewster. I guess I'm not sure what you're asking for. The utilities here include both the developer functionality, and the user-oriented stuff that's overlain, although @chrismheiser knows much more about the details in Python. It's possible/likely that what you're looking for is here, but just not obvious or documented. If it would be helpful, I'd be happy to have a quick skype call with you and @chrismheiser to figure out what you're trying to do and how we can help.

chrismheiser commented 6 years ago

If I'm understanding right, you should be able to include a LiPD package dependency into whatever you're doing, through PyPi or manually, and use whichever functions you please. We used to have some automated prompts and such overlaying the functions/package, but that's not longer the case.

I'm not sure if you've seen these docs yet, but they're a bit more in depth than the core repository for the overall LiPD Utilities

http://nickmckay.github.io/LiPD-utilities/python/index.html

brews commented 6 years ago

Thanks for the quick response, really appreciate it.

Thanks also for the link @chrismheiser. It looks like I may need to just hang tight and wait for the documentation to catch up to the code.

I am hoping for a developer-friendly API to use for LiPD - a standard and clear way to read, validate, and write LiPD objects in Python. I think the issue I have is that the user interface and application feels very tightly coupled with any backend code. For example, as soon as I import lipd, my working directory is populated with "benchmark.log" and "debug.log". This is behavior I might expect from an application but not a library - this makes it difficult if I want to run this as a component of a large and tight framework on a server or cluster - for instance.

I realize this is a niche thing and something that's not easily added, but it would help make lipd easier to develop for.

I appreciate y'alls work on this. And thanks again for taking this feedback.

CommonClimate commented 6 years ago

@brews this is exactly the feedback that we need, and I agree that the log files are a pain in the neck. They are debug features, I get it, but I wish they could be be exported only if a --debug option is turned on (or something like that). An alternative would be to call them .lipd_debug and .lipd_benchmark, so they are less visible.

In any case, one of the persistent design problems of lipd and associated libraries (e.g. pyleoclim) is that we expect to serve basic users with virtually no programming background, vs power users like yourself. I am strongly of the opinion that the library should serve power users (which includes documentation!), but be ergonomic enough that the most essential commands (e.g. readLipd) have intuitive default behavior. We're pretty close to it, I think, except for this business of log files and managing the input directory, which was there to support non power-users.

chrismheiser commented 6 years ago

Any other direct suggestions on what needs to change to make this more useful?

I've moved the logs to only start from lipd.debug(), instead of automatically.

The only other things I see as problematic are the global variables, which are

_timeseries_data = {} _settings = {"note_update": True, "note_validate": True, "verbose": True, "debug": False} _logs = {"start": None, "benchmark": None} _cwd = os.getcwd() _files = {".txt": [], ".lpd": [], ".xls": []}

timeseries data is a definite weak point, but there's not really any better way to handle that except for global storage. _settings, _logs, can be moved to an "options" parameter on each function, putting it on the user to decide what they want. _cwd I believe was something that deborah wanted in Spyder, but doesn't serve a whole lot of purpose. _files is another convenience variable that can be worked around. It was added in to make managing multiple files and file types in the workspace easier, but can be done in individual read functions.

chrismheiser commented 6 years ago

@brews

http://nickmckay.github.io/LiPD-utilities/python/source/lipd.html#module-lipd

Not sure if you saw this piece of the docs. The Sphinx module generated the pages a little strange, so these main functions are hard to find. I'll try to sort out the pages to be easier to browse and update the docs too.

brews commented 6 years ago

Thanks @chrismheiser. No other concrete changes at the moment, so I'll close this issue. I'll be sure to open another issue if I can come across something else specific.

Thanks again!

chrismheiser commented 6 years ago

@brews Sounds good. I have made the noted changes and am testing them. I'll have those pushed soon, and the docs to follow sometime after.