okfn-brasil / jarbas

🎩 API for information and suspicions about reimbursements by Brazilian congresspeople
https://jarbas.serenata.ai/
296 stars 61 forks source link

Importing and serializing company data #263

Open Ilozuluchris opened 7 years ago

Ilozuluchris commented 7 years ago

This is a WIP but I want to invite @cuducos to review what I have done so far #211

cuducos commented 7 years ago

rows.fields.EmailField raises a value error when it tries to serialize 'ahoy'

IMHO that is expected as 'ahoy' is not a valid email address… right?

Ilozuluchris commented 7 years ago

Yeah,it is...the serialize method returned invalid email addresses as none.I just mentioned it since that was not how the 'serialize' method handled invalid email addresses. Do you have any corrections to make?

Ilozuluchris commented 7 years ago

@cuducos I am confused,I do not know how to go about this https://github.com/datasciencebr/jarbas/blob/f95addd5ea4c47f161796cf1e72a58050f66ce60/jarbas/core/management/commands/suspicions.py#L50 since the serialize method actually reshapes rows

cuducos commented 7 years ago

@cuducos I am confused,I do not know how to go about this

Would you mind expanding on your doubt? I mean… you just highlighted the method we're trying to get rid of, so I don't know exactly where you doubt lies…

Ilozuluchris commented 7 years ago
    reserved_keys = (
        'applicant_id',
        'document_id',
        'probability',
        'year'
    )
    hypothesis = tuple(k for k in row.keys() if k not in reserved_keys)
    pairs = ((k, v) for k, v in row.items() if k in hypothesis)
    filtered = filter(lambda x: self.bool(x[1]), pairs)
    suspicions = {k: True for k, _ in filtered} or None

    return dict(
        document_id=document_id,
        probability=probability,
        suspicions=suspicions
    )

I was talking about this arrangement,am not sure how to go about this with rows

cuducos commented 7 years ago

Ow… you're right! I don't know any thing by heart, maybe here we do need a serialization method!… @turicas, any idea on that?

Our input is a CSV like:

document_id, hypothesis_1, hypotehesis_2, hypothesis_3
42,True,False,True

And the expected output is something like:

[
    {
        'document_id': 42,
        'suspicions': {
            'hypothesis_1': True,
            'hypothesis_3': True
        }
    }
]
Ilozuluchris commented 6 years ago

Hello @cuducos, anything left wrt issue #211

cuducos commented 6 years ago

Hello @cuducos, anything left wrt issue #211

Yep, tests are failing, so there is still something to fix. You might consider adding rows to the requirements.txt; )

Ilozuluchris commented 6 years ago

Okay I will, thanks.

Ilozuluchris commented 6 years ago

@cuducos The travis CI job does not run, the requests page reports this

GitHub payload is missing a merge commit (mergeable_state: "unknown", merged: false)

I believe it has to do with the merge conflict wrt requirements.txt. I tried to resolve myself but that did not work as expected.

cuducos commented 6 years ago

The travis CI job does not run

After fixing conflicts it is still red:

======================================================================
ERROR: test_save_companies (jarbas.core.tests.test_companies_command.TestCreate)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/python/3.5.4/lib/python3.5/unittest/case.py", line 59, in testPartExecutor
    yield
  File "/opt/python/3.5.4/lib/python3.5/unittest/case.py", line 605, in run
    testMethod()
  File "/opt/python/3.5.4/lib/python3.5/unittest/mock.py", line 1159, in patched
    return func(*args, **keywargs)
TypeError: test_save_companies() missing 1 required positional argument: 'lzma'
cuducos commented 6 years ago

@Ilozuluchris, I'm not sure why the CodeClimate checks are missing. Gonna fix that and then we come back to this PR, ok?

Ilozuluchris commented 6 years ago

@cuducos Okay no problem, I am sorry it took so long to do this.

Ilozuluchris commented 6 years ago

@cuducos All checks passed.