qzdl / Samplify

A sample retrieval engine, powered by whosampled.
GNU General Public License v3.0
29 stars 5 forks source link

Fix 'direction' parsing in scraper & finder #4

Closed qzdl closed 4 years ago

qzdl commented 5 years ago

Scraper

Currently just assumes the structure of the page is

  1. Contains samples of
  2. Sampled in
  3. Covered in
  4. Remixed in

Finder

Would be nice to support multiple directions here -> e.g. 'remixes' and 'sampled in'

qzdl commented 4 years ago

Building on the previous improvement, the approach taken centres around mapping the direction against the text contained within each header returned on the page. This works for UK-ENG, but there has been no coverage for multi-language. I'd really be grateful of an API to provide a source of truth, but I've really enjoyed the process of refining the accuracy of this information retrieval mechanism. In retrospect, I think stepping back from the problem, and being able to consider the minimum-maximum information bounds would have lead to a cleaner approach on the datastructure, instead of choosing to represent these relationships as mostly hierarchical, leading to a bigger overhead on the need to search the depth of the tree when selecting and filtering.

Honestly though, performance and scalability have never been a priority for this project; I'm in no rush to generate the playlists I'll listen to for months, as the payoff is so large that I'd be happy to wait for much longer than the ~20 seconds the process takes end-to-end. I think the importance should be placed on demonstrating some concept before ruminating about the optimal place. Sometimes we have to try again a few times, lest we stop on some local maxima, but if you can become so familiar with the problem that you can build towards it, then the way in which you compose the datastructures can change will also bend towards the problem. I'm excited to see how this all goes in Clojure.