Closed charliejeynes closed 2 years ago
@charliejeynes thanks for your suggestions. Yes, we have tried PySpark but not on Dask yet. I have not tried implementing on Dask yet but I hope that the pipeline should be similar.
If someone have implemented, please feel free to add or make the pull request!
Ok great I'll have an experiment and if it's worth it I'll make a pull request 🙂
Your documentation suggests you have tried using dask rather than pyspark to process pubmed/Medline. If you have any examples or tips it would be great if you would share - I'm going to attempt this myself but would be good to get some tips if you have already tried. I'm not keen on pyspark but like dask from what I've seen . Thanks