searx / searx

Privacy-respecting metasearch engine
https://searx.github.io/searx/
GNU Affero General Public License v3.0
13.37k stars 1.71k forks source link

What is the data lifecycle ? #2052

Open dalf opened 4 years ago

dalf commented 4 years ago

Maybe I'm overthinking.

Which data ?

Data and searx installation

After the data are updated in the git repository, once the searx get clone / install, the data remain the same as long searx is not updated.

How to update the data more often ?

  1. do nothing, keep the same process.
  2. keep the data in the searx git repository, add make data.update to update everything.
    • When to call it ?
      • manually : same problem as now.
      • cron in travis / github action : the script can create a PR.
  3. create a different package searx-data, automatically updated. It requires trust in this process.
asciimoo commented 4 years ago

Maybe I'm overthinking.

I don't think that you're overthinking, this is a real issue what we need to address.

I'd pick the 2. option from your suggested solutions with automatic updates periodically.

dalf commented 4 years ago

Some brainstorming: https://github.com/asciimoo/searx/wiki/Brainstorming:-IDE-&-database

return42 commented 4 years ago

Brainstorming: IDE & database

@dalf / thanks for your article .. to give my 5cent

IDE: for me, developing or bug fixing a searx engine is a very individual task, where I want to have the maximum degree of freedom to use the Swiss army knife which fits at its best to the context.

A IDE helps flatten the learning curve, the flip side of the coin is, that the quality of the contributions regress and the maintainers have to discuss again and again the same subjects .. I remember all the contributions with the "Update \<filename>" commit messages (e.g. https://github.com/asciimoo/searx/pull/1941). I mean; IDEs are really good to flatten the learn-curve but they don't help if the know-how is missing.

Database: mostly the same what I said about IDEs, beside that it could be a solution for regular updates to decouple engine development from searx kernel .. for this a git repository seems to be a more suitable solution. But https://github.com/asciimoo/searx/issues/2052#issuecomment-655607459 says he wants to keep the data in the searx git repository.

dalf commented 4 years ago

I want to have the maximum degree of freedom to use the Swiss army knife which fits at its best to the context.

The purpose is not to enforce a way to develop, it is to suggest a quick and easy way to develop engines/update data (and nothing about the core).

My ideas are not crystal clear, but if I sum up an example:

A reviewer can use the same tool to check the PR, but once again the usual tools work too (git, make, etc...).

The purpose here to allow contributions with just a browser (*). And for sure:

so, this idea is not for tomorrow.

(*) I say a browser because it is an easy way to have a rich UI, but a console UI is also a solution.