yndx-metrika / logs_api_integration

Script for integration with Logs API
46 stars 50 forks source link

Другая ошибка а разборе данных #9

Open umaxfun opened 7 years ago

umaxfun commented 7 years ago

Добрый день,

Столкнулись с проблемой: при загрузке КХ не может разобрать входные данные. DB::Exception: Cannot parse input: expected \n before: \tym:s:date\tym:s:dateTime\tym:s:goalsID\tym:s:isNewUser\tym:s:lastAdvEngine\tym:s:lastClickBannerGroupName\tym:s:lastDirectClickOrder\tym:s:lastDirectClickOrderName\tym, e.what() = DB::Exception

Полный лог (изменен токен и урл сайта; counterId -- настоящий):

C:\Users\artem.gusev\Desktop\logs_api_integration-master>python metrica_logs_api.py -source visits -start_date 2017-07-21 -end_date 2017-07-25
2017-07-31 20:24:32 MainProcess INFO     CLI Options: Namespace(end_date='2017-07-25', mode=None, source='visits', start_date='2017-07-21')
2017-07-31 20:24:32 MainProcess INFO     UserRequest(token=u'xxx', counter_id=u'11881159', start_date_str='2017-07-21', end_date_str='2017-07-25', source='visits', fields=(u'ym:s:counterID', u'ym:s:dateTime', u'ym:s:date', u'ym:s:visitDuration', u'ym:s:bounce', u'ym:s:pageViews', u'ym:s:goalsID', u'ym:s:clientID', u'ym:s:lastTrafficSource', u'ym:s:lastAdvEngine', u'ym:s:lastSearchEngineRoot', u'ym:s:visitID', u'ym:s:startURL', u'ym:s:browser', u'ym:s:isNewUser', u'ym:s:lastReferalSource', u'ym:s:referer', u'ym:s:lastDirectClickOrder', u'ym:s:UTMCampaign', u'ym:s:UTMContent', u'ym:s:UTMMedium', u'ym:s:UTMSource', u'ym:s:UTMTerm', u'ym:s:regionCity', u'ym:s:lastDirectClickOrderName', u'ym:s:lastClickBannerGroupName'))
2017-07-31 20:24:33 MainProcess INFO     ### CREATING TASK
{
  "date1_str": "2017-07-21",
  "date2_str": "2017-07-25",
  "request_id": 189036,
  "status": "created",
  "user_request": [
    "xxx",
    "11881159",
    "2017-07-21",
    "2017-07-25",
    "visits",
    [
      "ym:s:counterID",
      "ym:s:dateTime",
      "ym:s:date",
      "ym:s:visitDuration",
      "ym:s:bounce",
      "ym:s:pageViews",
      "ym:s:goalsID",
      "ym:s:clientID",
      "ym:s:lastTrafficSource",
      "ym:s:lastAdvEngine",
      "ym:s:lastSearchEngineRoot",
      "ym:s:visitID",
      "ym:s:startURL",
      "ym:s:browser",
      "ym:s:isNewUser",
      "ym:s:lastReferalSource",
      "ym:s:referer",
      "ym:s:lastDirectClickOrder",
      "ym:s:UTMCampaign",
      "ym:s:UTMContent",
      "ym:s:UTMMedium",
      "ym:s:UTMSource",
      "ym:s:UTMTerm",
      "ym:s:regionCity",
      "ym:s:lastDirectClickOrderName",
      "ym:s:lastClickBannerGroupName"
    ]
  ]
}
2017-07-31 20:24:33 MainProcess INFO     ### DELAY 20 secs
2017-07-31 20:24:53 MainProcess INFO     ### CHECKING STATUS
2017-07-31 20:24:53 MainProcess INFO     API Request status: created
2017-07-31 20:24:53 MainProcess INFO     ### DELAY 20 secs
2017-07-31 20:25:13 MainProcess INFO     ### CHECKING STATUS
2017-07-31 20:25:14 MainProcess INFO     API Request status: created
2017-07-31 20:25:14 MainProcess INFO     ### DELAY 20 secs
2017-07-31 20:25:34 MainProcess INFO     ### CHECKING STATUS
2017-07-31 20:25:34 MainProcess INFO     API Request status: processed
2017-07-31 20:25:34 MainProcess INFO     ### SAVING DATA
2017-07-31 20:25:34 MainProcess INFO     Part #0
2017-07-31 20:25:43 MainProcess INFO     ### DATA SAMPLE
2017-07-31 20:25:43 MainProcess INFO     ym:s:bounce    ym:s:browser    ym:s:clientID   ym:s:counterID  ym:s:date
ym:s:dateTime   ym:s:goalsID    ym:s:isNewUser  ym:s:lastAdvEngine      ym:s:lastClickBannerGroupName   ym:s:lastDirectClickOrder       ym:s:lastDirectClickOrderName   ym:s:lastReferalSource  ym:s:lastSearchEngineRoot       ym:s:lastTrafficSource  ym:s:pageViews  ym:s:referer    ym:s:regionCity ym:s:startURL   ym:s:UTMCampaign        ym:s:UTMContent ym:s:UTMMedium  ym:s:UTMSource  ym:s:UTMTerm    ym:s:visitDuration      ym:s:visitID
0       chromemobile    1500805991669493906     11881159        2017-07-23      2017-07-23 14:27:39     [2653102,4081657,6236337,4081657,6236337,4081657,6236337,4081657,6236337,4091284,6236337,6236751,6236754,4091284,6236337,6236751,6236754,4081657,6236337]       1       unknown                 0                       ad      9                       http://zzz.ru/?admitad_uid=6613ac1c9fa84ebbf077f558d22b2164&advcake=1     admitad 278512  cpa     advcake         198
4178276555863625609
0       yandex_browser  1498716776604298382     11881159        2017-07-21      2017-07-21 10:05:24     [4024870,6236871,2653102,4024870,6236871,4024870,6236871,4024870,6236871,4024870,6236871,4024870,6236871,4024870,6236871,4024870,6236871,4024870,6236871,4024870,6236871,4024870,6236871,4024870,6236871,4091284,4256032,6236337,6236751,6236754,4024870,6236871,4024870,6236871,4024870,6236871,4024870,6236871,4024870,6236871,4024870,6236871,4024870,6236871,4024870,6236871]
0       ya_undefined                    0       yandex.ru       yandex  organic 24      http://yandex.ru/clck/jsredir?from=yandex.ru;search%2F;web;;&text=&etext=1488&&l10n=ru  Vidnoe  http://zzz.ru/tury
                829     4128853169167929226
0       yandex_browser  149737450090090447      11881159        2017-07-22      2017-07-22 00:28:35     [4091284,6236751,6236754]       0       ya_undefined                    0                       direct  1               Severodvinsk
http://zzz.ru/turkey/resorts/alanya/hotels/tac-premier-hotel-spa-4.html#?fromCity=2&dateFrom=07.09.2017&dateTo=07.09.2017&nightFrom=10&nightTo=12&priceFrom=6000&priceTo=1000000&adults=2&kids=1&meal=all&activeTab=tours
                        16      4142429772089134798
0       safari_mobile   1500634811224441019     11881159        2017-07-21      2017-07-21 14:00:10     [4091284,6236751,6236754,28111274]      1       ya_undefined                    0       yandex.ru       yandex  organic 1       https://yandex.ru/      Moscow  http://m.zzz.ru/hungary/resorts/budapest#?fromCity=2&toCountry=20&toCity=358&dateFrom=29.07.2017&dateTo=29.07.2017&nightFrom=7&nightTo=8&adults=2&hotelClass=all&meal=all&priceFrom=6000&priceTo=1000000&sort=recommend                                           26      4132545904312059738
2017-07-31 20:25:44 MainProcess WARNING  1 rows were filtered out
2017-07-31 20:25:48 MainProcess CRITICAL Iteration #1 failed
Traceback (most recent call last):
  File "metrica_logs_api.py", line 127, in <module>
    integrate_with_logs_api(config, user_request)
  File "metrica_logs_api.py", line 107, in integrate_with_logs_api
    raise e
ValueError: Code: 27, e.displayText() = DB::Exception: Cannot parse input: expected \n before: \tym:s:date\tym:s:dateTime\tym:s:goalsID\tym:s:isNewUser\tym:s:lastAdvEngine\tym:s:lastClickBannerGroupName\tym:s:lastDirectClickOrder\tym:s:lastDirectClickOrderName\tym, e.what() = DB::Exception

Подскажите, пожалуйста, что делать в такой ситуации?

miptgirl commented 7 years ago

Добрый день!

Я попробовала запустить скрипт с аналогичными настройками на чистой БД и проблема не воспроизводится.

Предположу, что после предыдущего запуска изменился список выгружаемых полей в конфиге и новые данные не могут записаться в старую структуру таблиц. В этом случае стоит удалить таблицу и перезапустить скрипт, чтобы он создал таблицу нужной структуры.

umaxfun commented 7 years ago

Взял другой счетчик, другой набор полей, другую операционную систему 😎, чистую БД, проблема так же выглядит:

('##### python', '2.7.11')
2017-08-06 16:04:59 MainProcess INFO     CLI Options: Namespace(end_date=None, mode='history', source='hits', start_date=None)
2017-08-06 16:04:59 MainProcess INFO     Starting new HTTPS connection (1): api-metrika.yandex.ru
2017-08-06 16:04:59 MainProcess INFO     UserRequest(token=u'xxx', counter_id=u'3199561', start_date_str=u'2011-01-11', end_date_str='2017-08-04', source='hits', fields=(u'ym:pv:counterID', u'ym:pv:dateTime', u'ym:pv:date', u'ym:pv:clientID', u'ym:pv:title', u'ym:pv:URL', u'ym:pv:referer'))
2017-08-06 16:04:59 MainProcess INFO     Starting new HTTP connection (1): localhost
2017-08-06 16:04:59 MainProcess INFO     Starting new HTTPS connection (1): api-metrika.yandex.ru
2017-08-06 16:04:59 MainProcess INFO     ### CREATING TASK
2017-08-06 16:04:59 MainProcess INFO     Starting new HTTPS connection (1): api-metrika.yandex.ru
{
  "date1_str": "2011-01-11",
  "date2_str": "2011-07-29",
  "request_id": 194331,
  "status": "created",
  "user_request": [
    "xxx",
    "3199561",
    "2011-01-11",
    "2017-08-04",
    "hits",
    [
      "ym:pv:counterID",
      "ym:pv:dateTime",
      "ym:pv:date",
      "ym:pv:clientID",
      "ym:pv:title",
      "ym:pv:URL",
      "ym:pv:referer"
    ]
  ]
}
2017-08-06 16:04:59 MainProcess INFO     ### DELAY 20 secs
2017-08-06 16:05:19 MainProcess INFO     ### CHECKING STATUS
2017-08-06 16:05:19 MainProcess INFO     Starting new HTTPS connection (1): api-metrika.yandex.ru
2017-08-06 16:05:19 MainProcess INFO     API Request status: created
2017-08-06 16:18:02 MainProcess INFO     ### DELAY 20 secs
2017-08-06 16:18:22 MainProcess INFO     ### CHECKING STATUS
2017-08-06 16:18:22 MainProcess INFO     Starting new HTTPS connection (1): api-metrika.yandex.ru
2017-08-06 16:18:22 MainProcess INFO     API Request status: created
2017-08-06 16:18:22 MainProcess INFO     ### DELAY 20 secs
2017-08-06 16:18:42 MainProcess INFO     ### CHECKING STATUS
2017-08-06 16:18:42 MainProcess INFO     Starting new HTTPS connection (1): api-metrika.yandex.ru
2017-08-06 16:18:42 MainProcess INFO     API Request status: processed
2017-08-06 16:18:42 MainProcess INFO     ### SAVING DATA
2017-08-06 16:18:42 MainProcess INFO     Part #0
2017-08-06 16:18:42 MainProcess INFO     Starting new HTTPS connection (1): api-metrika.yandex.ru
2017-08-06 16:18:42 MainProcess INFO     ### DATA SAMPLE
2017-08-06 16:18:42 MainProcess INFO     ym:pv:clientID ym:pv:counterID ym:pv:date  ym:pv:dateTime  ym:pv:referer   ym:pv:title ym:pv:URL

2017-08-06 16:18:42 MainProcess WARNING  1 rows were filtered out
2017-08-06 16:18:42 MainProcess INFO     Starting new HTTP connection (1): localhost
2017-08-06 16:18:42 MainProcess INFO     Starting new HTTP connection (1): localhost
2017-08-06 16:18:42 MainProcess INFO     Starting new HTTP connection (1): localhost
2017-08-06 16:18:42 MainProcess INFO     Starting new HTTP connection (1): localhost
2017-08-06 16:18:42 MainProcess INFO     Starting new HTTP connection (1): localhost
2017-08-06 16:18:43 MainProcess CRITICAL Iteration #1 failed
Traceback (most recent call last):
  File "metrica_logs_api.py", line 127, in <module>
    integrate_with_logs_api(config, user_request)
  File "metrica_logs_api.py", line 107, in integrate_with_logs_api
    raise e
ValueError: Code: 27, e.displayText() = DB::Exception: Cannot parse input: expected \n at end of stream., e.what() = DB::Exception
umaxfun commented 7 years ago

Видимо, я что-то очень не так делаю. Как бы понять, что именно?

umaxfun commented 7 years ago

При том, скрипт создал нормальную структуру БД. Я скачал кусок с метрики запросом wget https://api-metrika.yandex.ru/management/v1/counter/3199561/logrequest/194328/part/1/download?auth_token=xxx -O c1, и нормально загружается руками cat c1 | sed "1d" | clickhouse-client --query="insert into qwe1.hits_all FORMAT TabSeparated" :) видимо вопрос к скрипту загрузки