Improve parsing performance for big text

mapado / datection

Detect and normalize temporal expressions

0 stars 0 forks source link

Improve parsing performance for big text #10

Closed jdeniau closed 4 years ago

jdeniau commented 9 years ago

Pull request :twisted_rightwards_arrows: created by bitbucket user dream on 2015-06-03 12:07 Last updated on 2015-06-09 15:46 Original Bitbucket pull request id: 10

Participants:

@badaz (reviewer) :heavy_check_mark:

bitbucket user dream

@jdeniau

bitbucket user nchaulet (reviewer)

bitbucket user O_P_mapado :heavy_check_mark:

@dallegoet (reviewer) :heavy_check_mark:

Source: https://github.com/mapado/datection/commit/5986f17c8890 on branch fb-fix-better-parse-perf Destination: https://github.com/mapado/datection/commit/97ab4769053c on branch master Merge commit: https://github.com/mapado/datection/commit/76b7f8472e3b

State: MERGED

jdeniau commented 9 years ago

@badaz commented on 2015-06-04 06:23

Comme pour la PR de Dim, je ne connais pas trop le projet mais le code à l'air propre, j'approuve

jdeniau commented 9 years ago

@badaz approved :heavy_check_mark: the pull request on 2015-06-04 06:23

jdeniau commented 9 years ago

@jdeniau commented on 2015-06-04 07:07

Location: line 295 of datection/tokenize.py

il ne faut pas mieux faire 'datection.grammar.{}'.format(self.lang) ?

jdeniau commented 9 years ago

@jdeniau commented on 2015-06-04 07:09

Location: line 318 of datection/tokenize.py

Pfouiou pas compris la double list comprehension any(not any()) ! :D

jdeniau commented 9 years ago

@jdeniau commented on 2015-06-04 07:12

Idem, j'approuve, mais je n'ai rien compris. J'ai juste compris que dans certains cas on ne passait pas à certains endroits avec le probe_kind, mais tout le tokeniser je suis perdu !

jdeniau commented 9 years ago

Bitbucket user O_P_mapado approved :heavy_check_mark: the pull request on 2015-06-05 14:26

jdeniau commented 9 years ago

@dallegoet approved :heavy_check_mark: the pull request on 2015-06-08 12:48

jdeniau commented 9 years ago

Bitbucket user dream commented on 2015-06-09 12:40

Location: datection/tokenize.py

il ne faut pas mieux faire 'datection.grammar.{}'.format(self.lang) ?

Pourquoi ? Ca me semble pas plus clair a lire si ?

jdeniau commented 9 years ago

Bitbucket user dream commented on 2015-06-09 12:56

Location: datection/tokenize.py

Pfouiou pas compris la double list comprehension any(not any()) ! :D

Voici la version mutable avec des 'break'.

contain_datetime_and_date = False
for ds in date_spans:
    ds_contained_in_a_span_datetime = False
    for dts in datetime_spans:
        if dts[0] <= ds[0] and dts[1] >= ds[1]:
            ds_contained_in_a_span_datetime = True
            break
    if not ds_contained_in_a_span_datetime:
        contain_datetime_and_date = True
        break

Du coup je trouve la version codé plus lisible.