h1alexbel / srdataset

GitHub repositories dataset that contains sample repositories (SRs), with their metrics and metadata
MIT License
4 stars 0 forks source link

feat(#22): md_to_text before filter by language #27

Closed h1alexbel closed 3 months ago

h1alexbel commented 3 months ago

closes #22


PR-Codex overview

This PR updates the code to read the CSV file name from an environment variable, applies a new filtering step to convert readme content to text, and improves CSV export functionality.

Detailed summary

The following files were skipped due to too many changes: tests/filtered-with-null-topics.csv, tests/filter-expected.csv

✨ Ask PR-Codex anything about this PR by commenting with /codex {your question}

h1alexbel commented 3 months ago

@rultor merge

rultor commented 3 months ago

@rultor merge

@h1alexbel OK, I'll try to merge now. You can check the progress of the merge here

rultor commented 3 months ago

@rultor merge

@h1alexbel Done! FYI, the full log is here (took me 7min)