Nice script you've got here. I am playing with Wikipedia at the moment and it really helps. However, there is a problem with dumps which contain a binary field. For example, if you try to convert categorylinks table dump to CSV and feed the resulting CSV to Pandas, Pandas will stumble into cl_sortkey field and die. So I thought it would be useful to have an opportunity to ignore such columns, and added an optional argument for it.
This argument does not break the current argument structure. All previous use-cases are still valid, including the one with reading from stdin.
I've also described the new argument in README.md. Feel free to correct my language.
Hi!
Nice script you've got here. I am playing with Wikipedia at the moment and it really helps. However, there is a problem with dumps which contain a binary field. For example, if you try to convert
categorylinks
table dump to CSV and feed the resulting CSV to Pandas, Pandas will stumble intocl_sortkey
field and die. So I thought it would be useful to have an opportunity to ignore such columns, and added an optional argument for it.This argument does not break the current argument structure. All previous use-cases are still valid, including the one with reading from stdin.
I've also described the new argument in
README.md
. Feel free to correct my language.