microsoft / service-fabric

Service Fabric is a distributed systems platform for packaging, deploying, and managing stateless and stateful distributed applications and containers at large scale.
https://docs.microsoft.com/en-us/azure/service-fabric/
MIT License
3.03k stars 401 forks source link

[BUG] - Cannot start solution from VS on local cluster #1505

Closed mcgiany closed 4 months ago

mcgiany commented 4 months ago

Hi,

i have a stateless service webapi application but all the sudden i cannot run it and debug.

My local cluster version is 10.1.1951.9590 If have this version of SF Kestrel <PackageReference Include="Microsoft.ServiceFabric.AspNetCore.Kestrel" Version="7.0.1816" />

The only errors i can find in events are:

{
  "Timestamp": "2024-07-23T08:17:48.0350646+02:00",
  "ProviderName": "Microsoft-ServiceFabric",
  "Id": 54422,
  "Message": "EventName: PartitionNewHealthReport Category: Health EventInstanceId: 0890fe31-df73-4792-bae5-e3d96662e83e PartitionId: 4b5cb5c1-264f-4a3a-8e46-203a3df43db7 SourceId=System.CRM Property=ServiceReplicaUnplacedHealth_Secondary_4b5cb5c1-264f-4a3a-8e46-203a3df43db7 HealthState=Warning TTL=65000 SequenceNumber=133661890680195833 Description='The Cluster Resource Manager was unable to find a placement for one or more of the Service's Replicas:
    Secondary replica could not be placed due to the following constraints and properties:  
    TargetReplicaSetSize: 1
    Placement Constraint: N/A
    Parent Service: N/A
    Constraint Elimination Sequence:
    Down nodes count 0, Deactivated nodes count 0, Deactivating nodes count 0, NodesPendingClose count 0
    ServiceTypeDisabled/NodesBlockListed eliminated 1 possible node(s) for placement -- 0/1 node(s) remain.
    ' RemoveWhenExpired=True SourceUTCTimestamp=07/23/2024 08:17:48 ",
  "ProcessId": 28488,
  "Level": "Informational",
  "Keywords": "0x4000000000000001",
  "EventName": "HM",
  "ActivityID": null,
  "RelatedActivityID": null,
  "Payload": {
    "eventName": "PartitionNewHealthReport",
    "category": "Health",
    "eventInstanceId": "\"0890fe31-df73-4792-bae5-e3d96662e83e\"",
    "partitionId": "\"4b5cb5c1-264f-4a3a-8e46-203a3df43db7\"",
    "sourceId": "System.CRM",
    "property": "ServiceReplicaUnplacedHealth_Secondary_4b5cb5c1-264f-4a3a-8e46-203a3df43db7",
    "healthState": 2,
    "TTLtimespan": 65000,
    "sequenceNumber": 133661890680195833,
    "description": "The Cluster Resource Manager was unable to find a placement for one or more of the Service's Replicas:
      Secondary replica could not be placed due to the following constraints and properties:  
      TargetReplicaSetSize: 1
      Placement Constraint: N/A
      Parent Service: N/A
      Constraint Elimination Sequence:
      Down nodes count 0, Deactivated nodes count 0, Deactivating nodes count 0, NodesPendingClose count 0
      ServiceTypeDisabled/NodesBlockListed eliminated 1 possible node(s) for placement -- 0/1 node(s) remain.
      ",
    "removeWhenExpired": true,
    "sourceUtcTimestamp": "\"\/Date(1721715468019)\/\"",
    "EventType": "_PartitionsOps_ProcessPartitionReportOperational"
  }
}
{
  "Timestamp": "2024-07-23T08:30:18.3934976+02:00",
  "ProviderName": "MyServiceName",
  "Id": 0,
  "Message": "ERROR: Exception in Command Processing for EventSource MyServiceName: Use of undefined keyword value 0x2 for event ServiceTypeRegistered.",
  "ProcessId": 44892,
  "Level": "Always",
  "Keywords": "0xFFFFFFFFFFFFFFFF",
  "EventName": "EventSourceMessage",
  "ActivityID": null,
  "RelatedActivityID": null,
  "Payload": {
    "message": "ERROR: Exception in Command Processing for EventSource MyServiceName: Use of undefined keyword value 0x2 for event ServiceTypeRegistered."
  }
}

The messages are not very usefull.

mcgiany commented 4 months ago

I think problem is in central secret service. Documentation from this page: https://learn.microsoft.com/en-us/azure/service-fabric/service-fabric-application-secret-store is probably outdated. This command doesnt work: Invoke-WebRequest -CertificateThumbprint <ClusterCertThumbprint> -Method POST -Uri "https:<clusterfqdn>/Resources/Secrets/supersecret/values/ver1/list_value?api-version=6.4-preview"

I get this response

{"Error":{"Code":"0x80090010","Message":"Null"}}
mcgiany commented 4 months ago

So we finally figure out why its not working. Problem was with certificate that we use for central secret service.

If we create cert with this script, it doesnt work.

param($certname, $password, $path)
$params = @{
    Subject = "CN=$certname" 
    CertStoreLocation = "Cert:\LocalMachine\My" 
    KeyExportPolicy = 'Exportable' 
    KeySpec = 'Signature' 
    KeyLength = 2048 
    KeyAlgorithm = 'RSA' 
    HashAlgorithm = 'SHA256'
    NotAfter = (Get-Date).AddMonths(24)
}
$cert = New-SelfSignedCertificate @params
Export-Certificate -Cert $cert -FilePath "$path\$certname.cer"
$mypwd = ConvertTo-SecureString -String $password -Force -AsPlainText
Export-PfxCertificate -Cert $cert -FilePath "$path\$certname.pfx" -Password $mypwd

if we use this, everything works.

New-SelfSignedCertificate -Type DocumentEncryptionCert -KeyUsage DataEncipherment -Subject mydataenciphermentcert -Provider 'Microsoft Enhanced Cryptographic Provider v1.0'

Both certs looks very similar.

[RANT] We spent more than 1/2 day to solve this problem, because error handling and error codes in SF sucks. Using SF is always struggle and i dont recommend to use it for any developer. This product is a big fail!