sirherrbatka / vellum

Data Frames for Common Lisp
BSD 2-Clause "Simplified" License
75 stars 10 forks source link

dependencies? #1

Closed slyrus closed 3 years ago

slyrus commented 3 years ago

Vellum looks cool and I'd like to use it, possibly with both postgres and mcclim, but do we really want vellum to depend on these libraries? Seems like a general purpose dataframe-ish library shouldn't need an RDMBS or an application framework like McCLIM. Am I missing something?

slyrus commented 3 years ago

Put another way, the integration stuff seems like it should have it's own ASDF systems.

sirherrbatka commented 3 years ago

Agreed. I was being lazy. Having said that, refactoring this out should be (unless I messed up) fairly easy. If you want to take a stab at that you are welcome, otherwise I will do it myself in the future (before submitting to the quicklisp).

slyrus commented 3 years ago

Ok, thanks. As I'm taking a look at vellum, what's the best way to bring up questions/issues? Should I just add them to this issue? Create a new one each time? Take them to, e.g., #lisp?

slyrus commented 3 years ago
  1. what is parse-data-line in REAMDE.org?
  2. CSV reading is rather convoluted. Is there a simple interface to parse csv files? There's parse-csv-line and there's the from-string stuff but it isn't obvious how they're used.
  3. separate tests from implementation - tests live in the implementations packages now. Probably don't want that long term.
  4. Do we really need another CSV parsing library? If so, can we separate this one from vellum and make it the one true CSV parsing library? Too many overlapping half-baked libraries in the lisp world.
sirherrbatka commented 3 years ago

Well, I do hang on #lisp so if you feel like reaching it there feel free to do so. Having said that, I would prefer to keep the discussion on the github, if that's fine for you. As for refactoring, I planned to simply leave relevant defgeneric definitions in the main system, while moving defmethods into separate ones. All of the stuff that should be moved out is located in the integration module. Only cl-ds belongs there (and cl-ds should ALSO be divided into smaller systems, but that is a lot more involved task).

  1. parse-data-line is an example symbol intended to represent user defined function.
  2. in theory, yes you can use (vellum:copy-from :csv ...) and this is how I envisioned adding support for an future data source. I intentionally omitted this from the README as my CSV parser really, really, really sucks. I don't want people to use it. At least at the moment.
  3. hmmm, do you really think this a problem? I didn't notice negative consequences of this approach.
  4. I wish I could get rid of my CSV parser. However at the moment I don't know what to use. CL-CSV is slow, FARE-CSV lacks few of the options I need. If you know any other library that you can recommend I would gladly use it.
slyrus commented 3 years ago

Your items 2 and 4 above seem to be in conflict :) As for 3, it's not critical, but I like to think of it as manually "tree-shaking" to keep the tests in their own packages/systems.

sirherrbatka commented 3 years ago

Well, I think that 2 and 4 make sense in tandem. I agree that CSV support Is important (and perhaps not only CSV but also parquet files) and I wish to support it. This is would be actually pretty easy to do (and minimal number of code lines on the vellum side) if only there was a really good CSV parsing library. My attempt was built around a peculiar use case (that is: csv files that simply should not be parsable) and it really does not translate well into a normal usage. Also, not that fast. Not at all.

In summary, that's how I think: CSV support is a must, my parser is garbage and belongs to the garbage can, ideally use other parser, if not possible write your own, but as separate library and depend on it.

sirherrbatka commented 3 years ago

As for the test, I simply don't anticipate them to be loaded by non-developers. And if so, why I should even care about what package they are in?

slyrus commented 3 years ago

The point of the tests is that they're automatically loaded for all now, so if you split them out, they don't get loaded for non-developers.

sirherrbatka commented 3 years ago

Oooh. I forgot to make a separate asdf system for the tests. Well, that is 100% required for sure, still not convinced about the package.

sirherrbatka commented 3 years ago

Hm, no, i have a separate asdf file for the tests. I am confused.

slyrus commented 3 years ago

Ah, I see. Yes, there is a separate ASDF system. My thought is that the package (e.g. vellum.csv) shouldn't be polluted by the tests. Looking at parser-tests.lisp, perhaps it's not but I worry that test frameworks like to intern various things. Perhaps there's nothing to worry about here. Still, I tend to think it's good practice to have a separate package defined for the tests. Not a big deal now that I see that the file doesn't even get loaded unless you load the test ASDF system.

sirherrbatka commented 3 years ago

We can think about this later, but my impression is that we are talking about purely aesthetical issue.

Cyrus Harmon pisze:

Ah, I see. Yes, there is a separate ASDF system. My thought is that the package (e.g. vellum.csv) shouldn't be polluted by the tests. Looking at parser-tests.lisp, perhaps it's not but I worry that test frameworks like to intern various things. Perhaps there's nothing to worry about here. Still, I tend to think it's good practice to have a separate package defined for the tests. Not a big deal now that I see that the file doesn't even get loaded unless you load the test ASDF system.

sirherrbatka commented 3 years ago

Ok, let's get back to the dependencies question.

Should we create a separate repo for each feature (one for mcclim viewer, one for sql etc.)? If yes, should I install new github project called vellum?

sirherrbatka commented 3 years ago

I added new project called vellum, and two new repositories vellum-postmodern, vellum-clim. I removed postgres releated stuff from the dependencies of vellum itself and will do the same with clim. I will then move to CSV and plotting.

sirherrbatka commented 3 years ago

Ok, although new CSV is not yet completed I think that this issue has been effectively taken care of. I will close it now.