netdata / netdata-cloud

The public repository of Netdata Cloud. Contribute with bug reports and feature requests.
GNU General Public License v3.0
41 stars 16 forks source link

[Feat] Use-cases]: Monitoring OPC UA with Netdata #562

Open shyamvalsan opened 1 year ago

shyamvalsan commented 1 year ago

Problem

Description

OPC UA is an open, industry independent, secure connectivity framework for industrial automation data. OPC UA is designed for use across industries for myriad customers across various industrial sectors.

Industrial plants have a large variety of machines and sensors which need to be monitored for safety, maintenance and operational efficiency. Easy and efficient access to this data will improve the R&D efficiency of the companies operating these plants by a considerable factor. Maintenance teams will be able to develop more efficient maintenance plans and Process engineers will be able to optimize their production lines, also ML and AI use-cases will become feasible with access to high fidelity reliable monitoring data.

There should be a Netdata collector that can connect to OPC UA server(s) and collect all the associated metric information (tags) from it.

Here's some useful links to get started:

Importance

really want

Value proposition

  1. Opens up a new market niche for Netdata - there are thousands of companies who operate industrial automation systems/PLCs and if Netdata can offer a simple, flexible, feature rich and cost effective way to monitor these systems/machines there is a potential for a lot of connected nodes in the future.
shyamvalsan commented 1 year ago

This feature was requested by a user, here's some feedback from a discussion I had with them.

  • Works as IS/IT coordinator in automation (large international manufacturer of heavy trucks), has many manufacturing plants, manufacturing components such as engines, gearboxes as well as assembly lines.
  • Has large amounts (~400) of CNC machines, heat treatment furnaces (lots of sensors for temperature, pressure atmosphere chemical composition, oil baths etc.) , robots, etc. Most equipment based on Siemens PLCs, though other brands exists of course.
  • Current tools to fetch and analyze machine process data too difficult to use and maintain. Eg: Kepware OPC proxy to send data to a data lake/database. In order to do this, we have to configure the Kepware proxy with each specific signal, data type and how and where to send it. This is something production engineers can't do themselves, but have to request from our internal IT department to do.
  • Trying to find better ways to provide actual machine data so that we can be much more agile. But also provide maintenance department with better information so they can do predictive/condition based maintenance instead of time/schedule based maintenance.
  • Goals are to be able to provide information to process engineers so they can optimize their production machines/lines and part quality, and provide maintenance department with enough information so they can develop much more efficient maintenance plans. Machine learning and AI needs as much information as possible to be effective.
  • Monitoring would have to be done remotely over ethernet. Need a node that can fetch the data and deliver to a parent. Possibly need several such nodes since there are several hundreds of machines and a machine can have 1000-30000 "tags" that could be monitored. The collector should be able to access several OPC-UA servers (machines), otherwise we'd need a swarm of collectors which would need more resources and would be harder to maintain as a infrastructure.

cc: @ktsaou @cakrit @sashwathn @amalkov @ralphm

amalkov commented 1 year ago

I believe this is a good opportunity to step in into the manufacturers ecosystems. The outcome of this work can be a paid support plan. It would be good to analyse the effort and implementation complexity.

Probably we just need to implement couple of collected and let it go, to be driven by the community, to validate the need.

shyamvalsan commented 1 year ago

If we build the collector and have a guide to using it - we could test the waters by sharing it with https://www.reddit.com/r/PLC/ and see how the community receives it.

ilyam8 commented 1 year ago

@thiagoftsm can you share your thought before starting to implement something? atm I have 0 understanding of what OPC UA is and what the ways to collect metrics are, but I googled go opc ua and found https://github.com/gopcua/opcua.

thiagoftsm commented 1 year ago

@thiagoftsm can you share your thought before starting to implement something? atm I have 0 understanding of what OPC UA is and what the ways to collect metrics are, but I googled go opc ua and found https://github.com/gopcua/opcua.

Thank you for the link @ilyam8 ! As soon I finish eBPF stuff I am doing right now, I will share data and details about what we can do :handshake: .

thiagoftsm commented 1 year ago

@shyamvalsan the Python examples you used are not async examples, instead we will have to use async version https://github.com/FreeOpcUa/opcua-asyncio of OPC UA.

I know we will write with go, I am only calling attention that OPC servers have two modes.

thiagoftsm commented 1 year ago

@shyamvalsan about the OPC UA metrics, it looks like that to get everything from the server is not recommended, because protocol was not designed for this, as you can see here, and here.

thiagoftsm commented 1 year ago

Hello,

Last week I finished the work with python to understand how OPC UA works (Server, client, protocol). This week I am shifting to go, because python library exposed in OP has limitations that do not allow us to get all metrics we need, and of course the plugin with be written with other library.

During the python development I observed that:

Best regards!

shyamvalsan commented 1 year ago

@thiagoftsm

image

I was thinking that namespaces should be correlated to jobs, so that each namespace will have a separate section in Netdata to themselves.

thiagoftsm commented 1 year ago

@shyamvalsan after I discuss with users your points I will bring another update.

thiagoftsm commented 1 year ago

During the tests I reach a OPC UA server that does not allow to query all Nodes, considering this scenario the safest option looks like to query IDS that are always present. The whole list is present in this link with prefix UA_NS0ID.

thiagoftsm commented 1 year ago

Since last message I ran different tests with different OPC servers and a specific PLC emulator developed by microsoft, for this last I was running it with following arguments:

docker run --rm -it -p 50000:50000 -p 8080:8080 --name opcplc mcr.microsoft.com/iotedge/opc-plc:latest --pn=50000 --autoaccept --sph --sn=5 --sr=10 --st=uint --fn=5 --fr=1 --ft=uint --ctb --scn --lid --lsn --ref --gn=5 --ut --aa --to

When I requested all variables for the microsoft PLC I got this result using the python library, because GO client does not allow me to connect with any server to require all nodes (ns0;i=84):

bash-5.1$ go run examples/read/read.go -endpoint opc.tcp://localhost:50000 -node 'ns=0;i=84'
Status not OK: The attribute is not supported for the specified Node. StatusBadAttributeIDInvalid (0x80350000)

As we discussed in our meeting, I am going to send an e-mail for our user requesting a real environment to test, and I will also report the issue in gopcua repo,

Forza-tng commented 1 year ago

Hi, I just wanted to chime in on the value proposition. At my company we have lots and lots of UA capable devices (CNC machines, robots, heat treatment, and other manufacturing equipment), and although there are plenty of comersial tools to gather data off these, they are usually focused to drive MES and ERP systems, or gather specific data.

The standard tools work good when we know exactly what data/signals we need. Then it is a matter of selecting the correct source and sending it to the right recipient/system.

My interest here is to find better ways to broadly gather data, visualise how it (the data) looks like, and provide ways to quickly look through thousands of signals/data sources. Netdata is very capable and can easilly graph throusands of metrics in an easy to use interface.

My goals are several.

thiagoftsm commented 1 year ago

@ilyam8 and @shyamvalsan I am adding here an example from a Demo IOT environment. As you can see the majority of the metrics are not defined and we won't use them.

Right now my expectations are that in a real environment, metrics not related to server will be listed with a different namespace(ns=2 or higher) and we will focus our collection on them.