Open aborruso opened 4 years ago
By now, you can force the type for a given column to be rows.fields.TextField
. Add force_types
parameter to the import function: rows.import_from_html(..., force_types={"field_name": rows.fields.TextField})
.
Turicas thank you.
My request is related to command line use. Is it possible to do it via cli?
Best regards
Oh, sorry. Let's reopen it.
Regarding to CLI: unfortunately not all CLI commands support forcing types. Some of them (csv2sqlite
, pgimport
) support passing a schema file via --schema
, but this way you need specify all field types.
But there's a hack you could do: convert CSV to SQLite forcing text (or integer or whatever type you want) and then back to CSV. Like this:
Let's say you have a file test.csv
with the following contents:
a,b
06583,some text
123,more text
Create the file schema.csv
with:
field_name,field_type
a,text
b,text
Then, execute the commands:
rows csv2sqlite --schemas=schema.csv test.csv test-temp.sqlite
rows sqlite2csv test-temp.sqlite test test2.csv
rm test-temp.sqlite
and use test2.csv
.
@aborruso if you think the field is being detected incorrectly (it was detected as float
while you think it should be detected as int
, for example), please open a new issue with data I can reproduce.
hi @turicas my input is this example HTML
<!DOCTYPE html>
<html>
<body>
<table id="results" border="0" class="regpub_dati c35">
<tbody>
<tr class="c28">
<th class="c27">Beneficiario</th>
<th class="c27">Comune</th>
<th class="c27">CAP</th>
<th class="c27">Provincia </th>
<th class="c27">Importo</th>
</tr>
<tr>
<td class="c31">RNDFNC60E16</td>
<td class="c31">RIPACANDIDA</td>
<td class="c31">85020</td>
<td class="c31">POTENZA</td>
<td class="c34">09269</td>
</tr>
<tr>
<td class="c31">RNDFNC60E16</td>
<td class="c31"></td>
<td class="c31"></td>
<td class="c31">POTENZA</td>
<td class="c34">05269</td>
</tr>
</tbody>
</table>
</body>
</html>
If I run rows convert input.html output.csv
I have
+--------------+-------------+-------+-----------+---------+
| beneficiario | comune | cap | provincia | importo |
+--------------+-------------+-------+-----------+---------+
| RNDFNC60E16 | RIPACANDIDA | 85020 | POTENZA | 9269.0 |
| RNDFNC60E16 | - | - | POTENZA | 5269.0 |
+--------------+-------------+-------+-----------+---------+
The last field is a string and it becomes a float.
I'm not able to create a PR and help you. But rows is really a great tool to convert almost everything to a usable table; my suggestion, my feature request is to add to the cli, an option like -I
that disable every kind of field inferencing, that force all field to be text fields. Then it will be the user, after the first import, to apply the right casting.
Thank you
Hi, I have an input HTML table in which a field has values as
06583
. If I convert this table to CSV, this value becomes6583.0
.It would be great to have a cli option to completely disable field type inferencing, and to have all output values as text field.
Thank you