nipunsadvilkar / pySBD

🐍💯pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.
MIT License
813 stars 84 forks source link

ERROR when proceesing this paragraphy #87

Closed GabrielLin closed 3 years ago

GabrielLin commented 3 years ago
We start by discussing the systematic discrepancy between results on comparable TI single crystals obtained by means of ARPES and transport experiments. The radial scatter plot of Fig. 1 compares the binding energy of the Dirac point obtained either by ARPES experiments (red circles) or by Shubnikov de Haas (SdH) oscillations in magneto-transport (blue circles). The value of E F -E D (i.e. the Dirac point binding energy) increases radially: the border of the inner circle corresponds to zero binding energy of the Dirac point (i.e. E D =E F ) and each tick denotes an increase of 100 meV. Each data point in the figure corresponds to a different experimental study in the literature, showing the work of many groups, including our own, and results are shown for five different TI compounds. A general conclusion can be readily made. ARPES shows a systematically higher binding energy for the Dirac point than magneto-transport experiments. We note that several ARPES studies [7, 8, 20-24, 26, 28, 29, 32, 39, 43, 45] have observed energy shifts to higher binding energies because of surface band bending on intentional and unintentional (= 'aging') surface decoration. In order to maintain a fair comparison with magneto-transport, the filled red circles in Fig. 1 correspond to surfaces that have been neither decorated nor aged in UHV. Such data points have been acquired in a time frame between a few minutes and 2 hours after cleavage. Empty markers show the value of E D -by means of ARPES-on exposure to air (empty squares) or on increasing exposure to the residual UHV gases (empty circles). Such surface decoration might be an even more important issue in magneto-transport experiments, as such experiments do not take place in a UHV environment and generally do not involve in-situ cleavage of the single crystalline sample. However, the magneto-transport data seems relatively insensitive to surface decoration as the binding energies of the Dirac point are smaller than even the most pristine surfaces studied by ARPES. Fig. 1 makes it clear that surface decoration alone cannot be the key to the observed differences between ARPES and QO experiments, and thus the conclusion drawn earlier -that the E D values obtained by SdH oscillations cannot be systematically reproduced by ARPES even in the most pristine surfaces -is still valid. In the following, we will explain where the difference in the experimentally determined E D comes from between the two techniques, and we will discuss whether we can approach the SdH values by means of ARPES. Fig. 2 shows the first experimental evidence that the surface band bending of 3D TIs is modified substantially on exposure to EUV illumination of a duration of a single second, compared to the typical timescale of ARPES data collection for an I(E, k) image of tens of seconds or even several minutes. In order to highlight that the development of the band bending is indeed dominated by EUV exposure, and not by simple surface decoration with residual UHV gases, as has generally been believed [7, 8, [20] [21] [22] [23] [24] 43] , we have constructed the following experimental protocol. Firstly, we have intentionally exposed all cleavage surfaces to residual UHV gases for 3 hours at low temperature before the first measurement. Secondly, we have limited the duration of each measurement (and hence the EUV exposure) to a minimum of 1-2 seconds using a photon flux of 3.2 × 10 21 photons/(s m 2 ). The optimization of the sample position with respect to the electron energy analyzer and the photon beam, and the adjustment of the emission angles -such that the detector image cuts through the center of the Brillouin zone-were carried out on a part of the cleave one or more millimeters away from the point where the data of Figs. 2 and 3 were recorded. This means that the E D values for the locations measured for Figs. 2 and 3 represent those for regions with carefully controlled EUV exposure [62] .

Please help to fix it. Thanks.

nipunsadvilkar commented 3 years ago

@GabrielLin Can you provide the python traceback as well?

GabrielLin commented 3 years ago

@nipunsadvilkar , here it is.

>>> seg = pysbd.Segmenter(language="en", clean=False)
>>> print(seg.segment(a))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/XXXX/.conda/envs/py36dmc/lib/python3.6/site-packages/pysbd/segmenter.py", line 87, in segment
    postprocessed_sents = self.processor(text).process()
  File "/home/XXXX/.conda/envs/py36dmc/lib/python3.6/site-packages/pysbd/processor.py", line 34, in process
    self.replace_abbreviations()
  File "/home/XXXX/.conda/envs/py36dmc/lib/python3.6/site-packages/pysbd/processor.py", line 180, in replace_abbreviations
    self.text = self.abbreviations_replacer().replace()
  File "/home/XXXX/.conda/envs/py36dmc/lib/python3.6/site-packages/pysbd/abbreviation_replacer.py", line 37, in replace
    abbr_handled_text += self.search_for_abbreviations_in_string(line)
  File "/home/XXXX/.conda/envs/py36dmc/lib/python3.6/site-packages/pysbd/abbreviation_replacer.py", line 93, in search_for_abbreviations_in_string
    text, match, ind, char_array
  File "/home/XXXX/.conda/envs/py36dmc/lib/python3.6/site-packages/pysbd/abbreviation_replacer.py", line 111, in scan_for_replacements
    txt = self.replace_period_of_abbr(txt, am)
  File "/home/XXXX/.conda/envs/py36dmc/lib/python3.6/site-packages/pysbd/abbreviation_replacer.py", line 71, in replace_period_of_abbr
    txt,
  File "/home/XXXX/.conda/envs/py36dmc/lib/python3.6/re.py", line 191, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "/home/XXXX/.conda/envs/py36dmc/lib/python3.6/re.py", line 301, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/home/XXXX/.conda/envs/py36dmc/lib/python3.6/sre_compile.py", line 562, in compile
    p = sre_parse.parse(p, flags)
  File "/home/XXXX/.conda/envs/py36dmc/lib/python3.6/sre_parse.py", line 855, in parse
    p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
  File "/home/XXXX/.conda/envs/py36dmc/lib/python3.6/sre_parse.py", line 416, in _parse_sub
    not nested and not items))
  File "/home/XXXX/.conda/envs/py36dmc/lib/python3.6/sre_parse.py", line 702, in _parse
    source.tell() - start)
sre_constants.error: missing ), unterminated subpattern at position 0