nextcloud / suspicious_login

Detect and warn about suspicious IPs logging into Nextcloud
GNU Affero General Public License v3.0
84 stars 25 forks source link

Datasets must have the same number of columns #860

Open roschi02 opened 7 months ago

roschi02 commented 7 months ago

I am getting the following error in Nextcloud. Could anybody help me?

{"reqId":"EwzO4qzJOSwXjUp3EhzW","level":3,"time":"2024-03-02T01:00:01+00:00","remoteAddr":"","user":"--","app":"suspicious_login","method":"","url":"--","message":"Caught unknown error during IPv4 background training","userAgent":"--","version":"28.0.3.2","exception":{"Exception":"Rubix\\ML\\Exceptions\\InvalidArgumentException","Message":"Datasets must have the same number of columns, 48 expected, but 16 given.","Code":0,"Trace":[{"file":"/var/www/nextcloud/apps/suspicious_login/lib/Service/DataLoader.php","line":131,"function":"merge","class":"Rubix\\ML\\Datasets\\Labeled","type":"->"},{"file":"/var/www/nextcloud/apps/suspicious_login/lib/Service/TrainService.php","line":72,"function":"generateRandomShuffledData","class":"OCA\\SuspiciousLogin\\Service\\DataLoader","type":"->"},{"file":"/var/www/nextcloud/apps/suspicious_login/lib/BackgroundJob/TrainJobIpV4.php","line":70,"function":"train","class":"OCA\\SuspiciousLogin\\Service\\TrainService","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/Job.php","line":81,"function":"run","class":"OCA\\SuspiciousLogin\\BackgroundJob\\TrainJobIpV4","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/TimedJob.php","line":102,"function":"start","class":"OCP\\BackgroundJob\\Job","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/TimedJob.php","line":92,"function":"start","class":"OCP\\BackgroundJob\\TimedJob","type":"->"},{"file":"/var/www/nextcloud/cron.php","line":152,"function":"execute","class":"OCP\\BackgroundJob\\TimedJob","type":"->"}],"File":"/var/www/nextcloud/apps/suspicious_login/vendor/rubix/ml/src/Datasets/Labeled.php","Line":364,"message":"Caught unknown error during IPv4 background training","CustomMessage":"Caught unknown error during IPv4 background training"}}
muchachagrande commented 7 months ago

In my case is very similar but on IPv6 training:

{"reqId":"ZH7FOecwUjPZ1VFjfqIC","level":3,"time":"2024-03-06T22:20:01-03:00","remoteAddr":"","user":"--","app":"suspicious_login","method":"","url":"--","message":"Caught unknown error during IPv6 background training","userAgent":"--","version":"28.0.3.2","exception":{"Exception":"Rubix\ML\Exceptions\InvalidArgumentException","Message":"Datasets must have the same number of columns, 80 expected, but 16 given.","Code":0,"Trace":[{"file":"/var/www/nextcloud/apps/suspicious_login/lib/Service/DataLoader.php","line":131,"function":"merge","class":"Rubix\ML\Datasets\Labeled","type":"->"},{"file":"/var/www/nextcloud/apps/suspicious_login/lib/Service/TrainService.php","line":72,"function":"generateRandomShuffledData","class":"OCA\SuspiciousLogin\Service\DataLoader","type":"->"},{"file":"/var/www/nextcloud/apps/suspicious_login/lib/BackgroundJob/TrainJobIpV6.php","line":70,"function":"train","class":"OCA\SuspiciousLogin\Service\TrainService","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/Job.php","line":81,"function":"run","class":"OCA\SuspiciousLogin\BackgroundJob\TrainJobIpV6","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/TimedJob.php","line":102,"function":"start","class":"OCP\BackgroundJob\Job","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/TimedJob.php","line":92,"function":"start","class":"OCP\BackgroundJob\TimedJob","type":"->"},{"file":"/var/www/nextcloud/cron.php","line":152,"function":"execute","class":"OCP\BackgroundJob\TimedJob","type":"->"}],"File":"/var/www/nextcloud/apps/suspicious_login/vendor/rubix/ml/src/Datasets/Labeled.php","Line":364,"message":"Caught unknown error during IPv6 background training","CustomMessage":"Caught unknown error during IPv6 background training"},"id":"65eb441600caa"}

Every day at about 10 PM

Luncheon3462 commented 6 months ago

same issue for me.

FernandoMarques-Santos commented 5 months ago

Same here.

I run Nextcloud only through IPv6 (for other reasons). Linux 5.15.0-105-generic x86_64 mysql Version: 10.6.16 PHP Version: 8.2.18

{"reqId":"ULKd7cYSJXRUaICAfYUX","level":3,"time":"2024-04-22T13:00:02+00:00","remoteAddr":"","user":"--","app":"suspicious_login","method":"","url":"--","message":"Caught unknown error during IPv6 background training","userAgent":"--","version":"28.0.4.1","exception":{"Exception":"Rubix\ML\Exceptions\InvalidArgumentException","Message":"Datasets must have the same number of columns, 80 expected, but 16 given.","Code":0,"Trace":[{"file":"/var/www/nextcloud/apps/suspicious_login/lib/Service/DataLoader.php","line":131,"function":"merge","class":"Rubix\ML\Datasets\Labeled","type":"->"},{"file":"/var/www/nextcloud/apps/suspicious_login/lib/Service/TrainService.php","line":71,"function":"generateRandomShuffledData","class":"OCA\SuspiciousLogin\Service\DataLoader","type":"->"},{"file":"/var/www/nextcloud/apps/suspicious_login/lib/BackgroundJob/TrainJobIpV6.php","line":69,"function":"train","class":"OCA\SuspiciousLogin\Service\TrainService","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/Job.php","line":81,"function":"run","class":"OCA\SuspiciousLogin\BackgroundJob\TrainJobIpV6","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/TimedJob.php","line":102,"function":"start","class":"OCP\BackgroundJob\Job","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/TimedJob.php","line":92,"function":"start","class":"OCP\BackgroundJob\TimedJob","type":"->"},{"file":"/var/www/nextcloud/cron.php","line":152,"function":"execute","class":"OCP\BackgroundJob\TimedJob","type":"->"}],"File":"/var/www/nextcloud/apps/suspicious_login/vendor/rubix/ml/src/Datasets/Labeled.php","Line":364,"message":"Caught unknown error during IPv6 background training","CustomMessage":"Caught unknown error during IPv6 background training"},"id":"662660a6ea0a5"}

theDepart3d commented 5 months ago

Getting the same error for IPv4

Nextcloud Version: 28.0.4 PHP Version: 8.2.7 MySQL Version: 10.11.4

{
    "reqId": "__REMOVED__",
    "level": 3,
    "time": "2024-04-23",
    "remoteAddr": "",
    "user": "--",
    "app": "suspicious_login",
    "method": "",
    "url": "--",
    "message": "Caught unknown error during IPv4 background training",
    "userAgent": "--",
    "version": "28.0.4.1",
    "exception": {
        "Exception": "Rubix\\ML\\Exceptions\\InvalidArgumentException",
        "Message": "Datasets must have the same number of columns, 48 expected, but 16 given.",
        "Code": 0,
        "Trace": [
            {
                "file": "/var/www/html/apps/suspicious_login/lib/Service/DataLoader.php",
                "line": 131,
                "function": "merge",
                "class": "Rubix\\ML\\Datasets\\Labeled",
                "type": "->"
            },
            {
                "file": "/var/www/html/apps/suspicious_login/lib/Service/TrainService.php",
                "line": 71,
                "function": "generateRandomShuffledData",
                "class": "OCA\\SuspiciousLogin\\Service\\DataLoader",
                "type": "->"
            },
            {
                "file": "/var/www/html/apps/suspicious_login/lib/BackgroundJob/TrainJobIpV4.php",
                "line": 69,
                "function": "train",
                "class": "OCA\\SuspiciousLogin\\Service\\TrainService",
                "type": "->"
            },
            {
                "file": "/var/www/html/lib/public/BackgroundJob/Job.php",
                "line": 81,
                "function": "run",
                "class": "OCA\\SuspiciousLogin\\BackgroundJob\\TrainJobIpV4",
                "type": "->"
            },
            {
                "file": "/var/www/html/lib/public/BackgroundJob/TimedJob.php",
                "line": 102,
                "function": "start",
                "class": "OCP\\BackgroundJob\\Job",
                "type": "->"
            },
            {
                "file": "/var/www/html/lib/public/BackgroundJob/TimedJob.php",
                "line": 92,
                "function": "start",
                "class": "OCP\\BackgroundJob\\TimedJob",
                "type": "->"
            },
            {
                "file": "/var/www/html/cron.php",
                "line": 152,
                "function": "execute",
                "class": "OCP\\BackgroundJob\\TimedJob",
                "type": "->"
            }
        ],
        "File": "/var/www/html/apps/suspicious_login/vendor/rubix/ml/src/Datasets/Labeled.php",
        "Line": 364,
        "message": "Caught unknown error during IPv4 background training",
        "CustomMessage": "Caught unknown error during IPv4 background training"
    },
    "id": "__REMOVED__"
}

Seems like this error only started recently since upgrading to the latest Nextcloud.

sbrodriguez commented 5 months ago

any updated about this issue?? The same running on a Docker container in a X86 architecture with Ubuntu 22.04 server

aodtcr commented 5 months ago

I am also having this issue for IPv6, running Nextcloud 28.0.5.1 on Ubuntu 20.04.6 LTS server. Nginx: 1.25.5 PHP: 8.2.18 MariaDB: 15.1 Distrib 10.6.17

{
  "reqId": "XXXX",
  "level": 3,
  "time": "2024-05-07T07:12:34+02:00",
  "remoteAddr": "",
  "user": "--",
  "app": "suspicious_login",
  "method": "",
  "url": "--",
  "message": "Caught unknown error during IPv6 background training",
  "userAgent": "--",
  "version": "28.0.5.1",
  "exception": {
    "Exception": "Rubix\\ML\\Exceptions\\InvalidArgumentException",
    "Message": "Datasets must have the same number of columns, 80 expected, but 16 given.",
    "Code": 0,
    "Trace": [
      {
        "file": "/var/www/nextcloud/apps/suspicious_login/lib/Service/DataLoader.php",
        "line": 131,
        "function": "merge",
        "class": "Rubix\\ML\\Datasets\\Labeled",
        "type": "->"
      },
      {
        "file": "/var/www/nextcloud/apps/suspicious_login/lib/Service/TrainService.php",
        "line": 71,
        "function": "generateRandomShuffledData",
        "class": "OCA\\SuspiciousLogin\\Service\\DataLoader",
        "type": "->"
      },
      {
        "file": "/var/www/nextcloud/apps/suspicious_login/lib/BackgroundJob/TrainJobIpV6.php",
        "line": 69,
        "function": "train",
        "class": "OCA\\SuspiciousLogin\\Service\\TrainService",
        "type": "->"
      },
      {
        "file": "/var/www/nextcloud/lib/public/BackgroundJob/Job.php",
        "line": 81,
        "function": "run",
        "class": "OCA\\SuspiciousLogin\\BackgroundJob\\TrainJobIpV6",
        "type": "->"
      },
      {
        "file": "/var/www/nextcloud/lib/public/BackgroundJob/TimedJob.php",
        "line": 102,
        "function": "start",
        "class": "OCP\\BackgroundJob\\Job",
        "type": "->"
      },
      {
        "file": "/var/www/nextcloud/lib/public/BackgroundJob/TimedJob.php",
        "line": 92,
        "function": "start",
        "class": "OCP\\BackgroundJob\\TimedJob",
        "type": "->"
      },
      {
        "file": "/var/www/nextcloud/cron.php",
        "line": 152,
        "function": "execute",
        "class": "OCP\\BackgroundJob\\TimedJob",
        "type": "->"
      }
    ],
    "File": "/var/www/nextcloud/apps/suspicious_login/vendor/rubix/ml/src/Datasets/Labeled.php",
    "Line": 364,
    "message": "Caught unknown error during IPv6 background training",
    "CustomMessage": "Caught unknown error during IPv6 background training"
  },
  "id": "XXXX"
vladbejenaru commented 4 months ago

getting a similar one, selfhosing nextcloud aio behind a swag reverse proxy

{
  "reqId": "CG8PdHAICgk2bVUekKZK",
  "level": 3,
  "time": "2024-05-13T13:59:55+00:00",
  "remoteAddr": "",
  "user": "--",
  "app": "suspicious_login",
  "method": "",
  "url": "--",
  "message": "Caught unknown error during IPv4 background training",
  "userAgent": "--",
  "version": "28.0.5.1",
  "exception": {
    "Exception": "Rubix\\ML\\Exceptions\\InvalidArgumentException",
    "Message": "Datasets must have the same number of columns, 48 expected, but 16 given.",
    "Code": 0,
    "Trace": [
      {
        "file": "/var/www/html/apps/suspicious_login/lib/Service/DataLoader.php",
        "line": 131,
        "function": "merge",
        "class": "Rubix\\ML\\Datasets\\Labeled",
        "type": "->",
        "args": [
          [
            "Rubix\\ML\\Datasets\\Labeled"
          ]
        ]
      },
      {
        "file": "/var/www/html/apps/suspicious_login/lib/Service/TrainService.php",
        "line": 71,
        "function": "generateRandomShuffledData",
        "class": "OCA\\SuspiciousLogin\\Service\\DataLoader",
        "type": "->",
        "args": [
          [
            "OCA\\SuspiciousLogin\\Service\\CollectedData"
          ],
          [
            "OCA\\SuspiciousLogin\\Service\\MLP\\Config"
          ],
          [
            "OCA\\SuspiciousLogin\\Service\\Ipv4Strategy"
          ]
        ]
      },
      {
        "file": "/var/www/html/apps/suspicious_login/lib/BackgroundJob/TrainJobIpV4.php",
        "line": 69,
        "function": "train",
        "class": "OCA\\SuspiciousLogin\\Service\\TrainService",
        "type": "->",
        "args": [
          [
            "OCA\\SuspiciousLogin\\Service\\MLP\\Config"
          ],
          [
            "OCA\\SuspiciousLogin\\Service\\TrainingDataConfig"
          ],
          [
            "OCA\\SuspiciousLogin\\Service\\Ipv4Strategy"
          ]
        ]
      },
      {
        "file": "/var/www/html/lib/public/BackgroundJob/Job.php",
        "line": 81,
        "function": "run",
        "class": "OCA\\SuspiciousLogin\\BackgroundJob\\TrainJobIpV4",
        "type": "->",
        "args": [
          null
        ]
      },
      {
        "file": "/var/www/html/lib/public/BackgroundJob/TimedJob.php",
        "line": 102,
        "function": "start",
        "class": "OCP\\BackgroundJob\\Job",
        "type": "->",
        "args": [
          [
            "OC\\BackgroundJob\\JobList"
          ]
        ]
      },
      {
        "file": "/var/www/html/lib/public/BackgroundJob/TimedJob.php",
        "line": 92,
        "function": "start",
        "class": "OCP\\BackgroundJob\\TimedJob",
        "type": "->",
        "args": [
          [
            "OC\\BackgroundJob\\JobList"
          ]
        ]
      },
      {
        "file": "/var/www/html/cron.php",
        "line": 152,
        "function": "execute",
        "class": "OCP\\BackgroundJob\\TimedJob",
        "type": "->",
        "args": [
          [
            "OC\\BackgroundJob\\JobList"
          ],
          [
            "OC\\Log"
          ]
        ]
      }
    ],
    "File": "/var/www/html/apps/suspicious_login/vendor/rubix/ml/src/Datasets/Labeled.php",
    "Line": 364,
    "message": "Caught unknown error during IPv4 background training",
    "CustomMessage": "Caught unknown error during IPv4 background training"
  },
  "id": "664349af1edda"
}
foegra commented 4 months ago

Same here

theDepart3d commented 4 months ago

Does anyone have an idea to what could be causing this issue ?

Are the devs, wining ?

This issue seems/looks to be reproducible.

DerDreschner commented 4 months ago

Have the same issue on my installation. NC 29.0.1 on Debian 12.

ChristophWurst commented 4 months ago

This issue seems/looks to be reproducible.

List the reproduction steps and I'll check it out.

SjoerdV commented 4 months ago

I think it might have something to do with the absence of enough training data

  1. running ncc suspiciouslogin:train -> gives 'Datasets must have the same number of columns, 48 expected, but 16 given.'
  2. running ncc suspiciouslogin:train --now gives 'Insufficient data: Not enough data for the specified maximum age'
muchachagrande commented 4 months ago

I have that same output, but when using IPv6:

  1. occ suspiciouslogin:train --v6 ----> Datasets must have the same number of columns, 80 expected, but 16 given.
  2. occ suspiciouslogin:train --v6 --now ----> Insufficient data: Not enough data for the specified maximum age

Using IPv4:

  1. occ suspiciouslogin:train ----> Prescision(y): 0.94174757281553 Prescision(n): 0.93333333333333 Recall(y): 0.93269230769231 Recall(n): 0.94230769230769 Model and estimator persisted.

  2. occ suspiciouslogin:train --now ----> Insufficient data: Not enough data for the specified maximum age

DerDreschner commented 4 months ago

I think it might have something to do with the absence of enough training data

Mhm, yeah, I have this feeling as well. After the error occoured on my instance, I've deleted all entries in the oc_login_address, oc_login_flow_v2 (was empty and is still empty), oc_login_ips_aggregated, oc_suspicious_login and oc_suspicious_login_model database tables to reset everything regarding the plugin.

Right now, after a bit more then a week, I've got ~5500 recorded logins with 16 IPs/tupels. That's even more then I've had before cleaning up the database tables without any change in usage (just me and my father). :thinking: Maybe there is something broken with the data recording?

Another information that may be interesting to the developers is the fact that I've disabled the plugin for quite some time in between. I had https://github.com/nextcloud/mail installed as well which lead the suspicious_login plugin broken due to an old RubixML version being loaded into memory. After identifying the issue and disabling/deleting the mail plugin, I've re-enabled the suspicious_login plugin. That means my datasets were not continuous.

As one way to debug the error, I would personally start by inspecting the oc_login_ips_aggregated table and check if everything is there that is expected to be there. Unfortunately, I have no "broken" instance right now to do that.

theDepart3d commented 3 months ago

Mhm, yeah, I have this feeling as well. After the error occoured on my instance, I've deleted all entries in the oc_login_address, oc_login_flow_v2 (was empty and is still empty), oc_login_ips_aggregated, oc_suspicious_login and oc_suspicious_login_model database tables to reset everything regarding the plugin.

Without any changes this is in my tables:

As one way to debug the error, I would personally start by inspecting the oc_login_ips_aggregated table and check if everything is there that is expected to be there. Unfortunately, I have no "broken" instance right now to do that.

Thats the issue, the tables look normal. The instance isn't really "broken" it just fails to log most of the login data.

Latest log data shows that there are no models available. no_models_found

theDepart3d commented 3 months ago

@ChristophWurst I had some time to setup a new VM and nextcloud instance.

To reproduce the issue simply do a clean install (latest version). Install recommended apps. Enable suspicious login app. Then run the following:

occ suspiciouslogin:train
# or
occ suspiciouslogin:train --now 

Both commands return:

Using ipv4 strategy
Not enough data, try again later (Insufficient data: Not enough data for the specified maximum age)

Meaning there is no model available.

Can you perhaps provide a working model from your setup

SjoerdV commented 3 months ago

@theDepart3d "To reproduce the issue simply do a clean install."

I am sorry, but you are not reproducing the error stated. The issue occurs when there is data present in the mentioned tables.

Simply cleaning and/or starting fresh does not guarantee the issue will not occur, when data is being written to the mentioned tables

theDepart3d commented 3 months ago

@SjoerdV Interesting. The nextcloud instance that has the error, simply does not populate the tables. Thats where I have no clue why.

The new VM is not publicly accessible, Ill get a domain later and leave it to populate the data. If it populates the data.

The VM with the errors also show the same error "Not enough data, try again later". The instance is publicly available but it does not populate the tables.

It uses a model but where does the default model get stored or does it create a model from the populated data ?

SjoerdV commented 3 months ago

@theDepart3d Yes, you have to wait (and actually use the instance), to have the tables populate.

Do please review the original error as posted by OP. it contains: 'Datasets must have the same number of columns, 48 expected, but 16 given.' <- actual topic of this issue.

Happens during training on data collected and stored in mentioned tables

mwdle commented 2 months ago

+1 on Nextcloud Hub 8 (29.0.3). This has been populating my log files with errors for a while...

kem-a commented 2 months ago

The same problem.

occ suspiciouslogin:train
Using ipv4 strategy
In Labeled.php line 364:                                          
  Datasets must have the same number of columns, 48 expected, but 16 given.

And

occ suspiciouslogin:train --now  
Using ipv4 strategy
Not enough data, try again later (Insufficient data: Not enough data for the specified maximum age)

I have 4 active users in my Nextcloud instance. I think it is safe to presume that this app is useless under certain amount of users since it will never generate enough login attempts to actually train model in some meaningful way.

I think this app should be removed as default from Nextcloud. Also it would be very helpful if devs could put in main page information about minimum amount of data required to train model.

ChristophWurst commented 2 months ago

I'm using the app with a three people instance without problems. If your users always log in from the same IP there won't be a lot of data indeed. But as soon as someone uses a phone to connect, there's some variance in IP addresses and plenty of data.

Also it would be very helpful if devs could put in main page information about minimum amount of data required to train model.

Sure: https://github.com/nextcloud/suspicious_login?tab=readme-ov-file#neural-net

joshtrichards commented 2 months ago

I think this app should be removed as default from Nextcloud.

It's not a default app. It's shipped, but disabled by default.

DerDreschner commented 2 months ago

I have 4 active users in my Nextcloud instance. I think it is safe to presume that this app is useless under certain amount of users since it will never generate enough login attempts to actually train model in some meaningful way.

I use it with only 2 users and have no more problems so far after my "reset" described here. Over 100 IPs with ~40.000 logins were collected so far. But the plugin only collected enough informations to train the model and activate it for 8 days yet. I will report here if the problem re-appears.

kraizelburg commented 2 months ago

I have 4 active users in my Nextcloud instance. I think it is safe to presume that this app is useless under certain amount of users since it will never generate enough login attempts to actually train model in some meaningful way.

I use it with only 2 users and have no more problems so far after my "reset" described here. Over 100 IPs with ~40.000 logins were collected so far. But the plugin only collected enough informations to train the model and activate it for 8 days yet. I will report here if the problem re-appears.

Could you please elaborate how did you solve the problem? I am using nextcloud in a docker container with only 2 users active. Seeing daily multiple errors in this regard. Thanks

mwdle commented 2 months ago

@kraizelburg you would need to access your Nextcloud DB and clear all the rows from each of the following tables: oc_login_address, oc_login_flow_v2, oc_login_ips_aggregated, oc_suspicious_login and oc_suspicious_login_model. I tried this but I only have a single user and have only had ~2700 logins from 8 IP addresses so I can't comment yet on whether or not this fix works or just stops the issue from happening temporarily because there isn't yet enough training data for the suspicious login notifications to take effect.

vgdh commented 1 month ago

I have the same issue